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Preface (Part II) 



This book, Part 2 - Polynomials and Canonical Forms, covers Chapters 6 through 8 of the book 
A Comprehensive Introduction to Linear Algebra (Addison- Wesley, 1986), by Joel G. Broida and 
S. Gill Williamson. Chapter 6, Polynomials, will be review to some readers and new to others. 
Chapters 7 and 8 supplement and extend ideas developed in Part I, Basic Linear Algebra, and 
introduce the very powerful method of canonical forms. The original Preface, Contents and 
Index are included. 
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Preface (Parts I, II, III) 



As a text, this book is intended for upper division undergraduate and begin- 
ning graduate students in mathematics, applied mathematics, and fields of 
science and engineering that rely heavily on mathematical methods. However, 
it has been organized with particular concern for workers in these diverse 
fields who want to review the subject of linear algebra. In other words, we 
have written a book which we hope will still be referred to long after any final 
exam is over. As a result, we have included far more material than can possi- 
bly be covered in a single semester or quarter. This accomplishes at least two 
things. First, it provides the basis for a wide range of possible courses that can 
be tailored to the needs of the student or the desire of the instructor. And 
second, it becomes much easier for the student to later learn the basics of 
several more advanced topics such as tensors and infinite-dimensional vector 
spaces from a point of view coherent with elementary linear algebra. Indeed, 
we hope that this text will be quite useful for self-study. Because of this, our 
proofs are extremely detailed and should allow the instructor extra time to 
work out exercises and provide additional examples if desired. 
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A major concern in writing this book has been to develop a text that 
addresses the exceptional diversity of the audience that needs to know some- 
thing about the subject of linear algebra. Although seldom explicitly 
acknowledged, one of the central difficulties in teaching a linear algebra 
course to advanced students is that they have been exposed to the basic back- 
ground material from many different sources and points of view. An experi- 
enced mathematician will see the essential equivalence of these points of 
view, but these same differences seem large and very formidable to the 
students. An engineering student for example, can waste an inordinate amount 
of time because of some trivial mathematical concept missing from then- 
background. A mathematics student might have had a concept from a different 
point of view and not realize the equivalence of that point of view to the one 
currently required. Although such problems can arise in any advanced mathe- 
matics course, they seem to be particularly acute in linear algebra. 

To address this problem of student diversity, we have written a very self- 
contained text by including a large amount of background material necessary 
for a more advanced understanding of linear algebra. The most elementary of 
this material constitutes Chapter 0, and some basic analysis is presented in 
three appendices. In addition, we present a thorough introduction to those 
aspects of abstract algebra, including groups, rings, fields and polynomials 
over fields, that relate directly to linear algebra. This material includes both 
points that may seem "trivial" as well as more advanced background material. 
While trivial points can be quickly skipped by the reader who knows them 
already, they can cause discouraging delays for some students if omitted. It is 
for this reason that we have tried to err on the side of over-explaining 
concepts, especially when these concepts appear in slightly altered forms. The 
more advanced reader can gloss over these details, but they are there for those 
who need them. We hope that more experienced mathematicians will forgive 
our repetitive justification of numerous facts throughout the text. 

A glance at the Contents shows that we have covered those topics nor- 
mally included in any linear algebra text although, as explained above, to a 
greater level of detail than other books. Where we differ significantly in con- 
tent from most linear algebra texts however, is in our treatment of canonical 
forms (Chapter 8), tensors (Chapter 11), and infinite-dimensional vector 
spaces (Chapter 12). In particular, our treatment of the Jordan and rational 
canonical forms in Chapter 8 is based entirely on invariant factors and the 



PREFACE 



ix 



Smith normal form of a matrix. We feel this approach is well worth the effort 
required to learn it since the result is, at least conceptually, a constructive 
algorithm for computing the Jordan and rational forms of a matrix. However, 
later sections of the chapter tie together this approach with the more standard 
treatment in terms of cyclic subspaces. Chapter 1 1 presents the basic formal- 
ism of tensors as they are most commonly used by applied mathematicians, 
physicists and engineers. While most students first learn this material in a 
course on differential geometry, it is clear that virtually all the theory can be 
easily presented at this level, and the extension to differentiable manifolds 
then becomes only a technical exercise. Since this approach is all that most 
scientists ever need, we leave more general treatments to advanced courses on 
abstract algebra. Finally, Chapter 12 serves as an introduction to the theory of 
infinite-dimensional vector spaces. We felt it is desirable to give the student 
some idea of the problems associated with infinite-dimensional spaces and 
how they are to be handled. And in addition, physics students and others 
studying quantum mechanics should have some understanding of how linear 
operators and their adjoints are properly defined in a Hilbert space. 

One major topic we have not treated at all is that of numerical methods. 
The main reason for this (other than that the book would have become too 
unwieldy) is that we feel at this level, the student who needs to know such 
techniques usually takes a separate course devoted entirely to the subject of 
numerical analysis. However, as a natural supplement to the present text, we 
suggest the very readable "Numerical Analysis" by I. Jacques and C. Judd 
(Chapman and Hall, 1987). 

The problems in this text have been accumulated over 25 years of teaching 
the subject of linear algebra. The more of these problems that the students 
work the better. Be particularly wary of the attitude that assumes that some of 
these problems are "obvious" and need not be written out or precisely articu- 
lated. There are many surprises in the problems that will be missed from this 
approach! While these exercises are of varying degrees of difficulty, we have 
not distinguished any as being particularly difficult. However, the level of dif- 
ficulty ranges from routine calculations that everyone reading this book 
should be able to complete, to some that will require a fair amount of thought 
from most students. 

Because of the wide range of backgrounds, interests and goals of both 
students and instructors, there is little point in our recommending a particular 
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course outline based on this book. We prefer instead to leave it up to each 
teacher individually to decide exactly what material should be covered to meet 
the needs of the students. While at least portions of the first seven chapters 
should be read in order, the remaining chapters are essentially independent of 
each other. Those sections that are essentially applications of previous 
concepts, or else are not necessary for the rest of the book are denoted by an 
asterisk (*). 

Now for one last comment on our notation. We use the symbol ■ to denote 
the end of a proof, and / to denote the end of an example. Sections are labeled 
in the format "Chapter.Section," and exercises are labeled in the format 
"Chapter.Section.Exercise." For example. Exercise 2.3.4 refers to Exercise 4 
of Section 2.3, i.e.. Section 3 of Chapter 2. Books listed in the bibliography 
are referred to by author and copyright date. 
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CHAPTER 6 



Polynomials 



Before continuing with our treatment of linear operators and transformations, 
we must make a digression and consider the theory of polynomials in some 
detail. The subject matter of this chapter will be quite important throughout 
much of the remainder of this text. Our basic goal is to discuss the factoriza- 
tion of polynomials in detail, including many of the elementary properties that 
we all learned in high school. 



6.1 DEFINITIONS 

Let ^ be a field. In high school (or earlier), we all learned that a polynomial 
p(x) in the indeterminate (or variable) x is basically an expression of the form 

p(x) = ao + aix + azx^ + • • • + anx° 

where n is any nonnegative integer and ao, . . . , an are all elements of J. Note 
that our elementary experience with polynomials tells us that if 

q(x) = bo + biX + • • • + bmX™ 

is another polynomial in x then, assuming without loss of generality that n > 
m, we have (where we define aj = for j > n and bj = for j > m) 



0^0 
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p(x) + q(x) = (ao + bo) + (ai + bi)x + • • • + (an + bn)x° 

and 

p(x)q(x) = (aobo) + (aobi + aibo)x + (aobz + aib; + a2bo)x2 

+ • • • + (aobk + aibk-i + • • • + ak-ibi + akbo)x'' 
+ --- + a„bnix" + '^ . 

While this has a definite intuitive appeal (to previous experience), the term 
"expression" in the above definition of polynomial is rather nebulous, and it is 
worth making this definition somewhat more precise. To accomplish this, we 
focus our attention on the coefficients a^. 

We define a polynomial over ^ to be an (infinite) sequence of scalars 

p = {ao, ai, a2, . . . } 

such that an = for all but finitely many n. The scalars ai are called the coeffi- 
cients of the polynomial. If 

q = {bo, bi, b2, . . . } 

is another polynomial in ^, then p = q if and only if = bi for every i. As we 
did for vector n-tuples, we define the addition of two polynomials p and q by 

p + q = {ao + bo, ai + bi, . . . } . 

Furthermore, we now also define the multiplication of p and q by 

pq = {Co, Ci, C2, . . . } 

where 

k 

= 2) = 2) ^A-t = %K + ^A-i + • • • + • 

/+ j=k t=0 

Since p and q have a finite number of nonzero terms, so do both p + q and pq, 
and hence both p + q and pq are also polynomials. 

We claim that the set of all polynomials over ^ forms a ring. Indeed, if we 
define the zero polynomial to be the sequence {0, 0, . . . }, and the negative of 
any polynomial {ao, ai, . . . } to be the polynomial {-ao, -ai, . . . }, then 
axioms (Rl) - (R6) for a ring given in Section 1.4 are clearly satisfied. As to 
axiom (R7), let p = {ao, ai, . . . }, q = {bo, bi, . . . } and r = {co, Ci, . . . }. Then 
the kth coefficient of (pq)r is the sum (using the associative property of ^ 



254 



POLYNOMIALS 



i+j=k\m+n=i / m+n+j=k m+n+j=k 

m+i=k \n+ j=i j 



But this last expression is just the kth coefficient of p(qr). Finally, to prove 
axiom (R8), we use the distributive property of to see that the kth coeffi- 
cient of p(q + r) is 



(■+ j=k i+ j=k i+ j=k 



Again we see that this last expression is just the kth coefficient of pq + pr. 
Similarly, it is easy to see that (p + q)r = pr + qr. 

It should be clear that the ring of polynomials is commutative, and that if 1 
is the unit element of ^, then {l,0,0,...}isa unit element for the ring of 
polynomials. However, since an arbitrary polynomial does not have a multi- 
plicative inverse, the ring of polynomials does not form a field (see Theorem 
6.2, Corollary 3 below). 

Example 6.1 Consider the polynomials 

p = {0, 1, 0, 0, . . . } 
q = {1,2,-1,0,...}. 

Then 

p + q = {1,3,-1,0,...} 

and 

pq = {0(1), 0(2) + 1(1), 0(-l) + 1(2) + 0(1), 

0(0) + l(-l) + 0(2) + 0(l),...} 

= {0,1,2,-1,0,...} . // 

Since the reader probably thought he (or she) already knew what a polyno- 
mial was, and since our definition may not be what it was he (or she) had in 
mind, what we will do now is relate our formal definition to our earlier ele- 
mentary experience with polynomials. We will explain shortly why we are 
going through all of this apparently complicated formalism. 

Given any element a £ ^, we associate a polynomial a' defined by 
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a' = {a, 0, 0, . . . } . 

This is clearly a one-to-one mapping of ^ into the set of all polynomials with 
coefficients in f. We also note that if a, b E J^, then a' = {a, 0, . . . } and b' = 
{b, 0, . . . } so that 

(a + b)' = {a + b, 0, . . . } = a' + b' 

and 

(ab)' = {ab, 0, . . . } = a' b' . 

If J^' denotes the set of all polynomials a' obtained in this manner, then f is a 
field isomorphic to J^. Because of this isomorphism, we shall identify the ele- 
ments of ^ with their corresponding polynomials, and write a = {a, 0, . . . }. 

Now let the symbol x denote the polynomial {0, 1, 0, 0, ... }. We call the 
symbol x an indeterminate. Applying our definition of polynomial multipli- 
cation, we see that x = {0, 0, 1, 0, . . . } and, in general, 

x" = {0,..., 0,1,0,...} 

where the 1 is in the nth position (remember that we start our numbering with 
0). We also see that for any a E ^ we have (applying our multiplication rule) 

ax° = {a, 0, . . . }{0, . . . , 1, 0, . . . } = {0, . . . , a, 0, . . . } . 

This means that an arbitrary polynomial p = {ao, ai, . . . , an, 0, . . . } can be 
uniquely expressed in the familiar form 

p = ao + aiX + sl2^ + • • • + anX° . 

This discussion has now established a precise meaning to the term 
"expression" used at the beginning of this chapter. We will denote the com- 
mutative ring of all polynomials over ^ by ^[x]. In summary, we see that that 
while a polynomial was actually defined as a sequence, we showed that any 
polynomial p = {ao, ai, . . . } G ^[x] could be uniquely expressed in terms of 
the indeterminate x as 

n 

i=l 

Now suppose we are given both a polynomial p = 2aiX' E ^[x] and any 
element c E ^. We define the element p(c) G ^ by p(c) = 2aic\ In other 
words, given a polynomial p G ^[x], the polynomial function p(x) is the 
mapping from ^ to ^ that takes c G ^ into the element p(c) G ^. We call p(c) 
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the value of the polynomial p when c is substituted for x. Because of this, a 
polynomial p £ ^[x] is frequently denoted by p(x). 

The reason for this apparently complicated technical definition is that it is 
possible for two different polynomials in ^[x] to result in the same polyno- 
mial function (see Exercise 6.1.1). 

Theorem 6.1 Suppose p, q £ ^[x] and cE:^. Then 

(a) (p ± q)(c) = p(c) ± q(c). 

(b) (pq)(c) = p(c)q(c). 

Proof (a) Writing p = ao + aiX + • • • + amX™ and q = bo + biX + • • • + bnx" 
we have 

(P ± q)ic) = (ao±bo) + (fli ±bi)c + {a^ ±b2)c'^+--- 

= {Uq + a^c + ^2^^ + • • •) ± (^'o + + ^2^^ + ■ ■ ■) 
= p{c)±q{c) . 

(b) Using p and q from part (a) and the definition of pq, we have 

{pq){c) = Qq^q + (AqZ^j + a^bQ )c + (aQ/?2 + + aj^o + • • • 
= {uq + a^c + ajC^ + • • ■){bQ + b^^c + ZjjC^ + • • •) 
= p{c)q{c) . I 

It should now be clear that the definitions given above for the algebraic 
properties of polynomials in terms of sequences are just those that we all 
learned in high school for adding and multiplying polynomials together. It 
should be easy for the reader to show that Example 6.1 may be repeated in 
terms of our elementary notion of polynomial addition and multiplication. 

If p = ao + aix + • • • + anx" ^ and an ^ 0, then we say that the degree of 
the polynomial p is n, and write we write deg p = n. The term an is called the 
leading coefficient of the polynomial, and if an = 1, then the polynomial is 
said to be monic. If deg p = 0, then p is said to be constant. By convention, 
the degree of the zero polynomial is not defined. 

Theorem 6.2 Suppose p, q G ^[x] are nonzero. Then 

(a) deg(p + q) < max{deg p, deg q} (where p + q 9^ 0). 

(b) deg(pq) = deg p + deg q . 

Proof (a) Let p = ao + ajX + • • • + amx"^ and q = bo + biX + • • • + bnX° where 
am, bn ^ 0. Then 
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p + q = (ao + bo) + (ai + bjx + • • • + (ak + bOx'' 

where k = max{m, n}. Therefore deg(p + q) < max{deg p, deg q} where the 
inequality follows since aj, + bj, could equal 0. 

(b) From the definition of pq, we see that (using the same p and q from 

part (a)) the kth term is Ckx'' where Cj, = 2i+j=k aibj. But if k > m + n, then 
necessarily either ai or bj is zero, and therefore Ck = for k > m + n. Since ^ is 
a field and therefore also a division ring, it follows that am, bn ^ implies 
Smbn ^ (if ambn = 0, then multiplying from the left by am'' says that bn = 0, 
a contradiction). Thus deg pq = m + n = deg p + deg q. I 

Corollary 1 If p, q G ^[x] are both nonzero, then deg p < deg pq. 

Proof Since p and q are nonzero, deg p > and deg q > 0, and hence deg p < 
deg p + deg q = deg pq. I 

Corollary 2 If p, q, r G ^[x], then 

(a) pq = implies that either p = or q = 0. 

(b) If pq = rq where q^O, then p = r. 

Proof (a) Ifp^O and q^O, then deg pq ^ implies that pq ^ 0. 

(b) Since ^[x] is a ring, we see that if pq = rq, then (p - r)q = 0. But q^O, 
so that by (a) we must have p - r = 0, or p = r. I 

We note that part (a) of this corollary shows that !F[)i] has no zero divisors, 
and hence ^[x] forms an integral domain (see Section 1.5). 

Corollary 3 Let p ;t be an element of ^[x]. Then there exists q G ^[x] such 
that pq = 1 if and only if deg p = 0, and hence ^[x] is not a field. 

Proof If deg p = then p = a G ^ with a^O, and thus there exists q = a"' G 
^ (hence q = a"' G iF[x]) such that pq = aa"' = 1. On the other hand, if pq = 1 
we have 

= deg 1 = deg pq = deg p + deg q . 

But p ^0 implies that deg p > 0, and if q ;t 0, we must also have deg q > 0. It 
then follows that deg p = 0. I 

We now prove a very useful and fundamental result known as the division 
algorithm. Essentially, we will show the existence of the polynomial quotient 
f/g (although technically this symbol as of yet has no meaning). The polyno- 
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mials q and r defined in the following theorem are called the quotient and 

remainder respectively. After the proof, we will give an example which 
shows that this is just the "long division" we all learned in elementary school. 

Theorem 6.3 (Division Algorithm) Given f , g G ^[x] with g 0, there exist 
unique polynomials q, r £ ^[x] such that 

f = qg + r 

where either r = or deg r < deg g. 

Proof The basic idea of the proof is to consider all possible degrees for the 
polynomials f and g, and show that the theorem can be satisfied in each case. 
After proving the existence of the polynomials q and r, we shall prove their 
uniqueness. 

If f = we simply choose q = r = 0. Now suppose that 

/ = + a^x + ■•■ + a^x'" 
g = bQ+biX + --- + b„x" 

where am, bn ^ 0. If m = n = 0, then by Corollary 3 of Theorem 6.2, there 
exists g"' £ ^[x] such that g"^g = 1, and therefore f = f(g"^g) = (fg"')g + 
satisfies our requirements. Next, if m = and n > 0, then we may write f = 
Og + f with deg r = deg f < deg g. 

We now assume that m > and proceed by induction on m. In other 
words, we assume that q and r can be found for all polynomials f with deg f < 
m - 1 and proceed to construct new polynomials q and r for deg f = m. First 
note that if n > m we may again take f = Og + f to satisfy the theorem. Thus we 
need only consider the case of n < m. 

Define the polynomial 

fl = f-(ana/bn)x'"-"g . 

Then the coefficient of the x"^ term in fi is (it cancels out on the right hand 
side), and hence deg f i < m - 1. Therefore, by our induction hypothesis, there 
exist polynomials qi and ri in ^[x] with either ri = or deg ri < deg g such that 
fl = qig + ri. Substituting the definition of fi in this equation yields 

f = [(am/bn)x'"-" + qi]g + ri . 
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If we define r = ri and q = (ani/bn)x'^"° + qi , we see that f = qg + r where 
either r = or deg r < deg g. This proves the existence of the polynomials q 
and r, and all that remains is to prove their uniqueness. 

Suppose that 

f = qg + r = q'g + r' 
where both r and r' satisfy the theorem, and assume that r^r'. Then 

r-r' = (q'-q)g ^ 

where 

deg(r-r') < deg g 

by Theorem 6.2(a). On the other hand, from Theorem 6.2(b) we see that 

deg(r - r') = deg[(q' - q)g] = deg(q' - q) + deg g > deg g . 

This contradiction shows that in fact r' = r. We now have (q' - q)g = with 
g^O, and hence by Corollary 2(a) of Theorem 6.2, we have q' - q = 0, and 
therefore q' = q. ■ 

Let us give an example of the division algorithm that should clarify what 
was done in the theorem. 

Example 6.2 Consider the polynomials 

f = 2x^+x^-x + l 
g = 2x-l . 

Following the proof of Theorem 6.3 we have 

f 1 = f - x-^g = x-^ + - X + 1 . 

Now let 

f2 = fl - (l/2)x2g = (3/2)x2 - X + 1 . 

Again, we let 

f3 = f2 - (3/4)xg = (-l/4)x + 1 



so that 



f4 = f3 + (l/8)g = 7/8 
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Since deg(7/8) < deg g, we are finished with the division. Combining the 
above polynomials we see that 

f = [x^ + (l/2)x2 + (3/4)x - (l/8)]g + U 

and therefore 

q = x^ + (l/2)jc2 + (3/4)x - (1/8) 
r = 7/8 . 

This may also be written out in a more familiar form as 

x3+(1/2)a;2+(3/4)x - (1/8) 



2jc-lj2jc4 + x^- x + l 

2x^ - x^ 

jc'+ x^ - x + l 

JC3 - (1/2)JC2 

(3/2)jc2- x + l 
(3/2)jc2-(3/4)x 

-(l/4)x + l 
-(l/4)x + (l/8) 
7/8 

It should be noted that at each step in the division, we eliminated the highest 
remaining power of f by subtracting the appropriate multiple of g. / 



Exercises 

1. Let ^ = {0, 1} be the field consisting of only two elements, and define 
addition and multiplication on these elements in the obvious way (see 

Exercise 1.5.17). Show that the distinct polynomials x^ - x and define 
the same polynomial function. 

2. Use the division algorithm to find the quotient and remainder when f = 
2x4 - + x - 1 e |g divided by g = 3x-^ - x^ + 3 G R[x]. 
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3. Consider the polynomials p = {2, 0, 1, 1} and q = {1, 1, -1} over R. 
Evaluate the product pq by applying the definition. Show that this yields 
the same result as directly multiplying together the polynomial functions p 
and q. 

4. Given a polynomial p = anx" + • • • + ajX + ao, we define its formal 
derivative to be the polynomial Dp = nanX°"^ + • • • + 2a2X + ai. In other 
words, D: ^[x] ^^[x] is a differentiation operator. Prove D(p + q) = 
Dp + Dq and D(pq) = p(Dq) + (Dp)q . 

5. Find the remainder when ix^ + 3x^ + x^ - 2ix + 1 £ C[x] is divided by 
X + i E C[x]. 

6.2 FACTORIZATION OF POLYNOMIALS 

If f(x) is a polynomial in iF[x], then c G ^ is said to be a zero (or root) of f if 
f(c) = 0. We shall also sometimes say that c is a solution of the polynomial 
equation f(x) = 0. We will see that information about the roots of a polynomial 
plays an extremely important role throughout much of the remainder of this 
text. 

If f, g G ^[x] and g ^ 0, then we say that f is divisible by g (or g divides 
f ) over ^ if f = qg for some q G ^[x]. In other words, f is divisible by g if the 
remainder in the division of f by g is zero. In this case we also say that g is a 
factor of f (over It is standard notation to write g|f when we wish to say 
that g divides f, or to write g'l f when g does not divide f. 

The next theorem is known as the remainder theorem, and its corollary is 
known as the factor theorem. 

Theorem 6.4 (Remainder Theorem) Suppose f G ^[x] and c G ^. Then the 
remainder in the division of f by x - c is f(c). In other words, 

f(x) = (x-c)q + f(c) . 

Proof We see from the division algorithm that f = (x - c)q + r where either 
r = or deg r < deg (x - c) = 1, and hence either r = or deg r = (in which 
case r G jTO- In either case, we may substitute c for x to obtain 

f(c) = (c - c)q(c) + r = r . I 
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Corollary (Factor Theorem) If f e ^[x] and cE:^, then x - c is a factor of 
f if and only if f(c) = 0. 

Proof Rephrasing the statement of the corollary as f = q(x - c) if and only if 
f(c) = 0, it is clear that this follows directly from the theorem. I 

Example 6.3 If we divide f = x^ - 5x^ + 7x by g = x - 2, we obtain q = x^ - 
3x + 1 and r = 2. It is also easy see that f(2) = 8 - 5(4) + 7(2) = 2 as it should 
according to Theorem 6.4. / 

Let R be a commutative ring with unit element. An element u E R is 
called a unit (not a unit element) if there exists r E R such that ur = 1. Other 
ways to say this are that u divides 1, or that a unit is an element whose inverse 
is also in the ring. We leave it to the reader to show that u E R is a unit if and 

only if it is a factor of every element of R (see Exercise 6.2.1). An element 
p G R that is neither zero nor a unit is said to be prime if p = ab implies that 
either a or b is a unit. Thus a prime element is one that can not be factored in a 
nontrivial way. 

Example 6.4 Since for any integer n^+l the number 1/n is not an integer, it 
should be clear that the ring of integers Z has only the units 1 and -1. On the 
other hand, if ^ is any field and a G ^, then a ' is also a member of ^, and 
hence any nonzero element of a field is a unit. In particular, the units of the 
ring !f[x] are just the polynomials of degree zero (i.e., the nonzero constant 
polynomials). 

If we consider the ring of integers Z, then a number p G Z with p +1 or 
will be prime if the only divisors of p are +1 and +p. However, if we 
consider the field R, then any nonzero element of R (i.e., any nonzero real 
number) is a unit, and hence the notion of a prime real number is not very 
useful. / 

In the particular case of R = ^[x], a prime polynomial is frequently called 
an irreducible polynomial. A polynomial that is not irreducible is said to be 
reducible. Two polynomials f , g G ^[x] are said to be associates if f = eg for 

some nonzero c G ^, and in general, two elements a, b G R are said to be 
associates if a = ub where u is a unit of R. We leave it to the reader to show 
that this defines an equivalence relation on a commutative ring with unit 
element (Exercise 6.2.3). 

It should be clear that any nonzero polynomial has exactly one monic 
polynomial as an associate since we can always write 

ao + aiX + • • • + anX° = an(aoan"' + aian"'x + • • • + x°) . 
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It should also be clear that any polynomial f with deg f > 1 has its associates 
and the set of nonzero constant polynomials as divisors. Thus we see that a 
nonzero polynomial f is prime if and only if f = gh implies that either g or h is 
of degree zero, and hence the other is an associate of f. 

It is important to realize that whether or not a polj^iomial is prime depends 
on the particular field ^. For example, since - 2 = (x - v5)(x + v5), we see 

that x^ - 2 is prime over Q, but is not prime over IR. 

Returning to our commutative ring R with unit element, we say that an 
element d G R is a greatest common divisor (frequently denoted simply by 
gcd) of the elements ai, . . . , an G R if 

(1) d|ai for every i = 1, . . . , n (i.e., d is a common divisor of the ai); 

(2) If c G R is such that c|ai for every i = 1, . . . , n, then c|d. 

Two distinct elements a, b G R are said to be relatively prime if their greatest 
common divisor is a unit of R. Note that we have referred to a greatest com- 
mon divisor, implying that there may be more than one. This is only true in a 
certain sense as the next theorem shows. 

Theorem 6.5 Let fi, . . . , fn be nonzero polynomials in ^[x]. Then there 
exists at least one greatest common divisor d of the set {fi, . . . , fn}. 
Moreover, this greatest common divisor is unique up to a unit factor, and can 
be expressed in the form d = 2'}= ihjfi for some set of polynomials hi G iF[x]. 

Proof Consider the set S of all polynomials in ^[x] of the form gjfj + • • • + 
gnfn where each gi is an arbitrary element of ^[x]. Then in particular, each fj 
is an element of S, and in addition, if p G S and q G ^[x], then pq G S. Let D 
be the set of degrees of all nonzero polynomials in S. Then D is just a collec- 
tion of nonnegative integers, and hence by the well-ordering principle 
(Section 0.5), D has a least element a. This means that there exists a nonzero 
polynomial d = hifi + • • • + hnfn G S such that a = deg d < deg c for all 
nonzero polynomials c G S. We first show that d|fi for every i = 1, . . . , n. 

By the division algorithm, for each i = 1, . . . , n we have fj = qid + ri 
where either r^ = or deg Vi < deg d. But ri = fj - q^d G S so that if ri ^ 0, we 
would have deg ri < deg d which contradicts the definition of d. Therefore we 
must have ri = and hence fi = qid so that d|fi for every i = 1, . . . , n. While 
this shows that d is a common divisor of {fi}, we must show that it is in fact a 
greatest common divisor. 

If c is any other common divisor of {fi, . . . , fn}, then by definition there 
exist polynomials gi such that fi = giC for each i = 1, . . . , n. But then 
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d = 2hifi = 2hi(giC) = (2higi)c 

so that c|d. This proves that d is a greatest common divisor. 

Now suppose d' is another greatest common divisor of the set {fi, . . . , fn}. 
Then by definition of greatest common divisor we must have both d|d' and 
d'|d, so that d' = ud and d = vd' for some polynomials u, v £ ^[x]. Multiplying 

the second of these equations by u, we see that uvd' = ud = d', and therefore 
d'(l - uv) = 0. By Corollary 2(a) of Theorem 6.2, we then have 1 - uv = so 
that uv = 1 and hence u and v are units. (Alternatively, the fact that deg d = 
deg d' implies that u and v must be of degree zero, and hence are units.) I 

What we have shown is that a gcd exists, and is unique up to its associates. 
Therefore, if we restrict ourselves to monic greatest common divisors, then we 
have proved the existence of a unique gcd. 

Corollary 1 Let qi, . . . , qn G ^[x] be relatively prime (i.e., they have no 
common divisors other than units). Then there exist elements hi, . . . , hn G 
^[x] such that hiqi + • • • + hnqn = 1- 

Proof This is an obvious special case of Theorem 6.5. I 

Corollary 2 Suppose f, g, p G ^[x] where p is prime and p|fg. Then either p|f 
or pig. 

Proof Since p is prime, its only divisors are units and its associates. 
Therefore, if we assume that p^l f, then the only greatest common divisor of p 
and f is a unit, and thus p and f are relatively prime. Applying Theorem 6.5, 
we may write up + vf = 1 for some u, v G iF[x], and hence multiplying by g 
yields pug + fgv = g. But p|fg so that fg = qp for some q G iF[x], and thus we 
have p(ug + qv) = g so that p|g. It is obvious that had we started with the 
assumption that p fg, we would have found that p|f. I 

By choosing f = f i and g = fj • • • fn in Corollary 2 we have the following 
obvious generalization. 

Corollary 2' If p, fi, fj, . . . , fn G ^[x] where p is prime and p|fif2 • • • fn, 
then p|fi for some i = 1, . . . , n. 

While Theorem 6.5 proves the existence of a greatest common divisor, it 
is not of any help in actually computing one. Given two polynomials, we can 
find their gcd by a procedure known as the Euclidean algorithm (compare 
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Section 0.7). This approach, illustrated in the next example, is also an alterna- 
tive proof of Theorem 6.5, the general case as stated in the theorem following 
by induction. 

Example 6.5 (Euclidean algorithm) Suppose f , g G ^[x] and f ^ 0. We 
show the existence of a unique monic polynomial d £ ^[x] such that 

(1) d|fandd|g. 

(2) If c G !f[x] is such that c|f and c|g, then c|d. 

First note that if g = and am is the leading coefficient of f, then the monic 
polynomial d = ani"'f satisfies both requirements (1) and (2). Now assume that 
g^O also. By the division algorithm, there exist unique polynomials qi and ri 
such that 

f = gqi + ri 

with either ri = or deg ri < deg g. If ri = 0, then g|f and d = g satisfies (1) and 
(2). If ri ^ 0, then we apply the division algorithm again to obtain polynomials 
q2 and r2 such that 

g = riq2 + r2 . 

If r2 = 0, then rjg which implies that rjf, and thus d = ri is a common divisor. 
(It still remains to be shown that this d is a greatest common divisor.) If r2 ^ 
0, then we continue the process, thus obtaining the following progression: 

f = 8<li+ h deg < deg g 

g = r^q^ + deg < deg 

Tj = r2^3 + rj deg < deg r2 

h-2 = rk-xQk + h deg < deg r^_i 



This progression must terminate as shown since the degree of any polynomial 
is a positive integer and deg ri > deg r2 > • • • > deg rk > 0. Letting rk be the 
last nonzero remainder, we claim that r^ = d. 

To see this, first note that rk|rk_i since rjj-i = rtq^+i . Next, we see that 



rk-2 = rk-i qk + Tk = rk qk+i qk + rk 



and therefore rk|rk_2. Continuing this procedure, we find that rk|rk_i, rk|rk-2, 
. . . , rk|ri, rklg and finally rk|f. This shows that d = rk satisfies (1). Now sup- 
pose that c|f and c|g. Then, since ri = f - gqi we must have c|ri. Similarly r2 = 
g - riq2 so that c|r2. Continuing this process, it is clear that we eventually 
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arrive at the conclusion that c|rk, thus proving (2) for the choice d = rt. 
Finally, if r is the leading coefficient of r^, then r'^r^ is a monic polynomial 
satisfying (1) and (2), and its uniqueness follows exactly as in the proof of 
Theorem 6.5. / 

Example 6.6 As a specific illustration of the preceding example, consider 
the polynomials f = x"^ - x^ - x^ + 1 and g = x^ - 1 over the field Q. Dividing 
f by g we obtain 

x^ - x^ - x^ + 1 = (x^ - l)(x - 1) + (-x^ + x) . 
Now divide g by ri = -x^ + x to obtain 

x3 - 1 = (-x^ + x)(-X - 1) + (X - 1) . 

Lastly, we divide ri by rj = x - 1 to find 

-X^ + X = (x - l)(-x) 

and therefore the gcd of f and g is x - 1. / 

Our next very important result is known as the unique factorization theo- 
rem. Recall that by definition, a prime polynomial is not a unit, and thus has 
positive degree. 

Theorem 6.6 (Unique Factorization Theorem) Every nonzero element f G 
!F[x] is either a unit, or is expressible as a unique (up to associates) finite 
product of prime elements. 

Proof We first show that f G ^[x] is expressible as a product of prime poly- 
nomials. Afterwards we will prove uniqueness. Our approach is by induction 
on deg f; in other words, we assume that deg f > 1 (if deg f = then f is a unit, 
and if deg f = 1 the theorem is obvious), and suppose that the theorem is true 
for all g G ^[x] with deg g < deg f. We will show that the theorem is true for 
f. 

Assume f is reducible (or else there is nothing to prove) so that f = pq 
where neither p nor q is a unit. By Theorem 6.2(b) we have deg p < deg p + 
deg q = deg f, and similarly deg q < deg f. Therefore, by our induction 
hypothesis, both p and q can be written as a finite product of prime elements 
in iF[x], and hence the same is true of f = pq. 

To prove the uniqueness of the product, assume that 
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f = P1P2 • • • Pn = qiQi • • • qm 

where each of the Pi and qj are prime. Since pilpi P2 • • • Pn, it follows that 
Pilqi q2 • • • qm • By Corollary 2' of Theorem 6.5, it then follows that pjqj for 
some j = 1, . . . , m. But since both pi and qj are prime and pilq,, they must be 
associates, and hence qj = UiPi where Ui is a unit in !f[x]. This means that 

Pi P2 • • • Pn = qi qi • • • qm = qi qi • • • qj-iUi Piqj+i • • • qm • 

Cancelling p, from both sides of this equation (using Theorem 6.2, Corollary 
2) results in 

P2 • • • Pn = Uiqi • • • qj-iqj+i • • • qm • 

Repeating this argument, we next eliminate pz and one of the remaining fac- 
tors on the right. Continuing in this manner, we pairwise eliminate one of the 
Pi and one of the q^ with each operation, always replacing a with a corre- 
sponding Uk. But the primes on one side of this equation can not be eliminated 
before those on the other side because this would imply that a product of 
prime polynomials was equal to 1 which is impossible. Therefore n = m, and 
the expansion of f as a product of prime elements must be unique up to an 
associate. I 

Note that the expansion proved in this theorem for f G ^[x] is completely 
unique (except for order) if we require that the prime polynomials be monic. 

Example 6.7 Consider the polynomial p = 3x^ - 3x^ - 6 G ^[x]. Using the 
fields Q, IR and C we can factor p three different ways depending on ^: 

3(jc2-2)(x2 + l) inQ[x] 
3(x + V2)(jc->/2)(jc2 + l) inR[x] 
3(x + y/2)(x-^)(x + i)(x-i) inC[x] 

In each case, p is a product of prime polynomials relative to the appropriate 
field. / 

Exercises 

1. Let R be a commutative ring with unit element. Show that u G R is a unit 
of R if and only if u is a factor of every element of R. 
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2. Let R be an integral domain with unit element and suppose it is true that 
a|b and b|a for some a, b £ R. Show that a = ub where u is a unit in R. 

3. Show that the property of being associates defines an equivalence relation 
on a commutative ring with unit element. 

4. Let ^ be an arbitrary field, and suppose p £ ^[x] is of degree < 3. Prove 
that p is prime in ^[x] if and only if p is either of degree 1, or has no 
zeros in J^. Give an example to show that this result is not true if deg p > 
3. 

5. Factor the following polynomials into their prime factors in both IR[x] and 
Q[x]: 

(a) 2x^ - x^ + X+ 1. 

(b) 3x3 + 2x2-4x+ 1. 

(c) x^ + 1. 

(d) x'^ + 16. 

6. Let = {0, 1} be the field consisting of only two elements, and define 
addition and multiplication on ^ in the obvious way. Factor the following 
polynomials into primes in ^[x] : 

(a) + X + 1. 

(b) x^ + 1. 

(c) x"* + x^ + 1. 

(d) x'^ + 1. 

7. Let ^ be as in the previous problem. Find the greatest common divisor of 
x^ + x^ + X + 1 and x^ + x"^ + x^ + x^ + x + 1 over ^[x]. 

8. Find the greatest common divisor of the following pairs of polynomials 
over IR[x]. Express your result in the form defined in Theorem 6.5. 

(a) 4x^ + 2x2 - 2x - 1 2x^ - x^ + x + 1. 

(b) x^ - X + 1 and 2x^ + x^ + x - 5. 

(c) x^ + 3x2 _|_ 2 and x^ - x. 

(d) x^ + x^ - 2x - 2 and x^ - Ix^ + - 6x. 

9. Use the remainder theorem to find the remainder when 2x^ - 3x^ + 2x + 
1 e R[x] is divided by: 

(a) X - 2 e IR[x]. 

(b) X + 3 e R[x]. 
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10. (a) Is X - 3 a factor of 3x^ - 9x^ - 7x + 21 over Q[x]? 

(b) Is X + 2 a factor of x^ + Sx^ + 6x - 8 over R[x]? 

(c) For which k£Qisx-la factor of x^ + 2x^ + x + k over Q[x]? 

(d) For which k£Cisx + ia factor of ix^ + 3x^ + x^ - 2ix + k over 
C[x]? 

11. (a) Construct an example to show that the division algorithm is not true 
if is replaced by the integral domain Z. 

(b) Prove that if the division algorithm is true for polynomials over an 
integral domain D, then D must be a field. 

12. Determine the monic associate of: 

(a) 2x^ -x + 1 eQ[x]. 

(b) -ix^ + X + 1 E C[x]. 

13. Letf = ao + aiX + • • • + anX° E Z[x] be a polynomial with integer coeffi- 
cients, and suppose r/s E Q is a rational root of f. Assume that r and s are 
relatively prime. Prove that r|ao and s|an. 



6.3 POLYNOMIAL IDEALS 

We now apply the formalism of Section 1.5 to polynomials. If the reader has 
not yet studied that section (or does not remember it), now is the time to go 
back and do so (again if necessary). 

Example 6.8 Let A and B be ideals of ^[x]. We show that 
ARB = {fe J[x]:feAandfeB} 

and 

A + B = {f + ge^[x]: f e AandgeB} 

are both ideals of iF[x]. Indeed, if f, g G A Pi B, then f ± g G A and f ± g G B 
since A and B are ideals. Therefore f±gGAnBso that A Pi B is a 
subgroup of ^[x] under addition. Similarly, if f G A fl B and g G ^[x], then 
fg G A and fg G B so that fg G A fl B, and hence A fl B is an ideal. Now 
suppose fl + gi, f2 + gi G A + B. Then 



(fl + gi) ± (f2 + g2) = (fl ± fl) + (gl ± g2) G A + B 
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so that A + B is also a subgroup of ^[x] under addition. Finally, suppose that 
fi + gi e A + B and h e J[x]. Then 

(fi + gi)h = fih + gih E A + B 

so that A + B is an ideal also. / 

Recall that ^[x] is not only a ring, it is in fact an integral domain (Corol- 
lary 2 of Theorem 6.2). Since !f[x] is a ring, we see that if p E !f[x], then p 
can be used to generate the principal ideal (p) of ^[x]. Note that by con- 
struction, the ideal (p) can not contain any prime polynomials (other than 
associates of p) even if p itself is prime. This is because any q E (p) can be 
written in the form q = pr for some r E ^[x], and hence p|q. We will also write 
the principal ideal (p) in the form 

(P) = P^W = {pf:fE^[x]} . 

Our next theorem is frequently useful when working with quotient rings 
(see Theorem 1.13) of the general form ^[x]/(p). 

Theorem 6.7 Suppose p = ao + ajX + • • • + anX° E ^[x], SLn^O, and let I = 
(p). Then every element of ^[x]/I can be uniquely expressed in the form 

I + (bo + biX + • • • + bn-ix°"^) 
where bo, . . . ,bn-i EjT. 

Proof Choose any I + f E ^[x]/I. By the division algorithm (Theorem 6.3) 
we can write f = pq + r for some q, r E ^[x] with either r = or deg r < deg p. 
But by definition of I, we have pq E I so that 

I + f = I + (pq + r) = I + r . 

This shows that I + f has the form desired. 

To prove the uniqueness of this representation, suppose that 

I + (bo + bi X + • • • + bn-lX°"^) = I + (Co + Ci X + • • • + Cn-lX°"^) . 

Then (by adding I + (-Co - CjX - • • • - Cn-ix°"^) to both sides) we see that 



(bo - Co) + (bi - Ci)x + • • • + (b„-i - Cn-i)x° 1 E I 
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But the degree of any nonzero polynomial in 1 must be greater than or equal to 
n = deg p (by definition of I and Theorem 6.2(b)), and therefore it follows that 

(bo - Co) + (bi - Ci)x + • • • + (bn-i - Cn-i)x"-i = . 

Since two polynomials are equal if and only if their coefficients are equal, this 
means that bi = Ci for every i = 0, . . . , n - 1. I 

It is an interesting fact that every ideal of ^[x] is actually a principal ideal. 
We prove this in our next theorem. 

Theorem 6.8 Every ideal of the ring ^[x] is a principal ideal. 

Proof Let I be any ideal of ^[x]. If I = {0}, then I is just the principal ideal 
(0). Now assume that I -^^ {0} and let g be any nonzero polynomial of least 
degree in I. (That g exists follows from the well-ordering principle. In other 
words, if S is the set of degrees of all polynomials in iF[x], then S has a least 
element.) From the definitions it is clear that (g) C I. We now show that I C 
(g) which will then prove that I = (g). 

By the division algorithm, for any f £ I there exist polynomials q, r £ ^[x] 
such that f = gq + r where either r = or deg r < deg g. Since f G I and gq E I, 
it follows that r = f - gq G I. But if r we have deg r < deg g which contra- 
dicts the definition of g as the polynomial of least degree in I. Therefore r = 
so that f = gq G (g), and hence I C (g). I 

Corollary Every ideal of ^[x] is generated by a unique monic polynomial. 

Proof Since every ideal of iF[x] is principal, suppose that (p) = (q) or, equiv- 
alently, p ^[x] = q ^[x]. Then clearly p G p ^[x] so that p = qfj for some fi G 
iF[x]. Similarly, we see that q = pfz for some fz G ^[x] and hence 

p = qfi = pfzfi . 

Since ^[x] is an integral domain it follows that fjfj = 1. But this means that fi 
and f2 are units (i.e., constant polynomials), and hence q = cp for some c G ^. 
Noting that cp ^[x] is the same as p ^[x], we see that any ideal p ^[x] may be 
written in the form cp ^[x] for arbitrary c G jF. By choosing c to be the 
inverse of the leading coefficient of p, we have shown that any ideal of ^[x] 
has a unique monic generator. I 
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In Section 6.2 we discussed the greatest common divisor of a collection of 
polynomials. We now treat a related concept that the reader may be 
wondering about. If f, g G ^[x] then, by the least common multiple (or 
simply 1cm) of f and g, we mean the polynomial m G ^[x] such that f|m and 
g|m, and if m' G ^[x] is another polynomial that satisfies f|m' and g|m', then 
m|m'. 

As a useful observation, note that if f, g G ^[x] and f|g, then g = fq for 
some q G !F[x\. But (f ) = f ^[x] and hence 

(g) = gJF[x] = fqjF[x] C (f) . 

In other words, f|g implies that (g) C (f ). 

Example 6.9 Let A and B be ideals of ^[x]. By Theorem 6.8 and Example 
6.8 we may write A = h ^[x] and B = k iF[x], and also A H B = m ^[x] and 
A + B = d ^[x]. We claim that d is a greatest common divisor of h and k, and 
that m is a least common multiple of h and k. 

To see this, first note that since h G h ^[x] = A, it follows that h = h + G 
A + B, and hence h = dhi for some hi G ^[x]. Similarly, we must have k = dki 
for some ki G ^[x]. Therefore d|h and d|k so that d is a common divisor of h 
and k. We must show that if d'|h and d'|k, then d'|d. Now, if d'|h and d'|k, then 
A = (h) C (d') and B = (k) C (d'). But then A + B C (d') because for any 
f + gGA + Bwe have f G A C (d') and g G B C (d'), and therefore f + g G 
(d') since (d') is an ideal. This means that d G A + B C (d') so that d = d'p for 
some p G ^[x], and hence d'|d. 

Now note that m G A n B so that m G A implies m = hmi and m G B 
implies m = kmj for some polynomials mi, m2 G ^[x]. This means that h|m 
and k|m so that m is a common multiple of h and k. Next, note that if h|m' 
then (m') C (h) = A, and if k|m' then (m') C (k) = B. Therefore we see that 
m' G (m') C A n B = (m) so that m' = mq for q G ^[x]. Hence m|m' and m is 
a least common multiple of h and k. / 

Greatest common divisors and least common multiples will be of 
considerable use to us in the next chapter. The following important theorem 
relates least common multiples and greatest common divisors. Rather than 
prove it directly, we simply note that it follows easily from Theorem 6.10 
below. 

Theorem 6.9 Suppose h, k G ^[x] and let d and m be the greatest common 
divisor and least common multiple respectively of h and k. Then hk = dm. 
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Recall from Theorem 6.6 that any polynomial h £ ^[x] is expressible as a 
unique product of prime polynomials. If h contains the prime factor gi £ ^[x] 

repeated ri times, then we write gi'' as one of the factors of h. Therefore we 
may write the decomposition of h in the form h = Higi''- If k £ ^[x] is another 

polynomial, then it may also be factored in the same manner as k = Iljqj^j- In 
fact, we may write both h and k as a product of the same factors if we allow 
the exponent to be zero for any factor that does not appear in that expansion. 
In other words, we may write h = rF}= ipi''' and k = 11}= ipi^' where r^ > and 
Si > for each i = 1, . . . , n. 

The next theorem contains Theorem 6.9 as an immediate and obvious 
corollary. 

Theorem 6.10 Suppose that h, k E j7^[x] and write 

n n 

!=1 1 = 1 

where each ri > and each Si > 0. For the given ri and Si, define the polyno- 
mials 

«=n^'' ^=n^'' y=n^''' ^=n^'' 

ri>Si riSSj Si<ri Sj^rj 

SO that h = a|3 and k = yb. Then the least common multiple m of h and k is 
given by 

n 

m = ad = Y[Pi'^'^^^" ''^ 

i=l 

and the greatest common divisor d is given by 

n 

J = |gy = ]^p.™n(';.^.) . 

! = 1 

Proof Since the expressions for h and k are given in terms of the same set of 
prime polynomials pi, a moment's thought should make it clear that the least 
common multiple is given by 

n 

(=1 

Formally, we see that h|m and k|m, and if m' is another polynomial such that 
h|m' and k|m', then by Theorem 6.6 again we can write m' = Hf = iPi'' where 
we must have ti > ri and ti > Si for each i = 1, . . . , n in order that m' be a com- 
mon multiple of h and k. But this means that m|m' so that m is the least 
common multiple of h and k. In any case, m is exactly the same as the product 
a8. 
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That the greatest common divisor is given by n?=iPi'^'° = Py fol^ 
lows from a similar argument. I 

By Theorem 1.13, the quotient structure ^[x]/(p) is a ring for any p, and 
we now show that it is actually a field for appropriate p. 

Theorem 6.11 Suppose p £ ^[x], and let I = (p). Then ^[x]/I is a field if and 
only if p is prime over ^. 

Proof We first show that if p is reducible over ^, then iF[x]/I is not a field. 
(This is the contrapositive to the statement that if iF[x]/I is a field, then p must 
be prime.) To see this, assume that p = ab where neither a nor b is a unit, and 
each is of degree less than that of p. From the definition of a principal ideal, 
we see that the degree of any polynomial in I must be greater than or equal to 
deg p, and hence neither a nor b can be an element of I. Since I is the zero 
element of ^[x]/I, we see that I + a and I + h must both be nonzero 
elements of ^[x]/I. But then 

(I + a)(I + b) = I + ab = I + p = I 

where we used the fact that p £ I. This shows that the product of two nonzero 
elements of ^[x]/I yields the zero element of iF[x]/I, and thus the set of 
nonzero elements of ^[x]/I is not closed under multiplication. Hence ^[x]/I is 
neither a division ring nor an integral domain, so it certainly is not a field. 

Conversely, suppose that p is prime. Since ^[x] is a commutative ring, it 
follows that for any a, b G ^[x] we have that iF[x]/I is also a commutative 
ring. The identity element in ^[x]/I is easily seen to be I + 1 where 1 is the 
unit element for the field ^. Therefore, all that remains is to show the 
existence of a multiplicative inverse for each nonzero element in ^[x]/I. 

If I + f is any nonzero element in ^[x]/I, then f ^ I so that p|f . Since p is 
prime, its only divisors are units and its associates, and therefore the greatest 
common divisor of p and f is a unit (i.e., p and f are relatively prime). 
Applying Corollary 1 of Theorem 6.5 we see there exist u, v E ^[x] such that 
up + vf = 1. Then 1 - vf = up E I and hence 

I + 1 = I + up + vf = I + vf = (I + v)(I + f ) . 

This shows that I + v is a multiplicative inverse of I + f. I 
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Exercises 

1. Referring to Theorem 6.7, show that {I + c: c E } is a subfield of !F[x]fl 
isomorphic to ^. 

2. Let p = 1 + e R[x] and let I = (p). 

(a) Show R[x]/I is a field. 

(b) Show R[x]/I is isomorphic to C. [Hint: Justify defining the mapping 6: 
R[x]/I C by 6(1 + (a + bx)) = a + ih. Show that 6 is bijective and pre- 
serves addition. To show that preserves multiplication, note that 

(I + (a + bx))(I + (c + dx)) = I + (ac + (ad + bc)x + bdx^) . 

Write this in the form I + (u + vx) by following the first part of the proof 
of Theorem 6.7. Now show that 

e[(I + (a + bx))(I + (c + dx))] = 6(1 + (a + bx))e(I + (c + dx)) .] 

3. Suppose f , g G !F[^]- Prove that (f ) = (g) if and only if f and g are associ- 
ates. 

4. Suppose f , g E ^[x]. Prove or disprove the following: 

(a) If (f ) = (g), then deg f = deg g. 

(b) Ifdegf = degg,then(f) = (g). 

(c) If f e (g) and deg f = deg g, then (f ) = (g). 

5. Find the greatest common divisor and least common multiple of the 
following pairs of polynomials: 

(a) (x - l)(x + 2)2 and (x + 2)(x - 4). 

(b) (x - 2)\x - 3)\x - i) and (x - l)(x - 2)(x - 3)^. 

(c) (x2 + l)(x2 - 1) and (x + i)\x^ - 1). 

6. (a) Suppose fi, . . . , fn G ^[x], and let I = f]!f + • • • + f^^ be the set of all 
polynomials of the form g = figi + • • • + fngn where gi G ^[x]. Show that I 
is an ideal. This is called the ideal generated by {fi, . . . , fn}. 

(b) Show, in particular, that ^[x] is an ideal generated by {!}. This is 
called the unit ideal. 

(c) Let d be the unique monic generator of I. Show that d divides each of 
the fi. 

(d) If c G ^[x] divides each of the fi, show that c|d. 
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(e) Suppose {fi, f2, fs} generates the unit ideal. Show that we can always 
find polynomials fij £ ^[x] such that 



/l 


fl 


fs 




/21 


fll 




= 1 


/31 









[Hint: Show there exists gi, g2, g3 G iF[x] such that 2gifi = 1, and let a = 
gcd{gi, gz}. Next, show there exists hi, hj G iF[x] such that (gi/a)hi + 
(g2/a)h2 = 1. Now use the polynomials gi, hj and a to form the fij.] 

6.4 POLYNOMIALS OVER ALGEBRAICALLY CLOSED FIELDS 

We now turn to a discussion of polynomials over the fields IR and C. These are 
well worth considering in more detail since most practical applications in 
mathematics and physics deal with these two special cases. By way of termi- 
nology, a field ^ is said to be algebraically closed if every polynomial f £ 

^[x] with deg f > has at least one zero (or root) in ^. 

Our next theorem is called the Fundamental Theorem of Algebra. While 
most proofs of this theorem involve the theory of complex variables, this 
result is so fundamental to our work that we present a proof in Appendix A 
that depends only on some relatively elementary properties of metric spaces. 
Basically, if the reader knows that a continuous function defined on a compact 
space takes its maximum and minimum values on the space, then there should 
be no problem understanding the proof. However, if the reader does not even 
know what a compact space is, then Appendix A presents all of the necessary 
formalism for a reasonably complete understanding of the concepts involved. 

Theorem 6.12 (Fundamental Theorem of Algebra) The complex number 
field C is algebraically closed. 

Proof See Appendix A. I 

Let p = ao + ai X + • • • + anX° be a polynomial of degree n > 1 over an 
algebraically closed field ^. Then there exists an element ai ^ ^ such that 
p(ai) = 0. Hence applying the factor theorem (Corollary to Theorem 6.4) we 
have 

p = (x-ai)q 
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where q is a polynomial of degree n - 1. Again, if n - 1 > 0, we see that q has 
a zero a2 in J^, and continuing this process we obtain 

p = c(x - ai)(x - aj) • • • (x - an) 

where c is a unit of ^. We thus see that any polynomial of degree n > 1 over 
an algebraically closed field has exactly n roots (although they need not be 
distinct). We repeat this statement as part of the next theorem. 

Theorem 6.13 Let ^ be an algebraically closed field. Then every prime 
polynomial p G ^[x] has (up to a unit factor) the form x - a where a G ^. 
Moreover, every monic polynomial f G ^[x] can be factored into the form 

/=nu-«,) 

(=1 

where each ai G J^. 

Proof Let p G ^[x] be prime. Since ^ is algebraically closed, there exists 
a G ^ such that p(a) = 0. By the factor theorem, x - a must be a factor of p. 
But p is prime so its only factors are its associates and units. This proves the 
first part of the theorem. 

Now let f G ^[x] be of degree n > 1. The second part of the theorem is 
essentially obvious from the first part and Theorem 6.6. However, we may 
proceed as follows. Since ^ is algebraically closed there exists ai G ^ such 
that f(ai) = 0, and hence by the factor theorem, 

f = (x-ai)qi 

where qi G ^[x] and deg qi = n - 1 (Theorem 6.2(b)). Now, by the algebraic 
closure of ^ there exists aj G ^ such that qi(a2) = 0, and therefore 

qi = (x-a2)q2 

where deg q2 = n - 2. It is clear that we can continue this process a total of n 
times, finally arriving at 

f = c(x - ai)(x - a2) • • • (x - an) 

where c G ^ is a unit. In particular, c = 1 if qn-i is monic. I 
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While Theorem 6.13 shows that any polynomial of degree n over an alge- 
braically closed field has exactly n (not necessarily distinct) roots, a more 
general result is the following. 

Theorem 6.14 Any polynomial p G ^[x] of degree n > 1 over ^ has at most 
n roots in !f. 

Proof We proceed by induction on the degree n of p. If n = 1, then p = + 
a,x so that -a,/ao is the unique root of p, and the theorem thus holds in this 
case. Now assume that n > 1 and that the theorem holds for all polynomials of 
degree less than n. If p has no roots, then the conclusion of the theorem is 
valid, so we assume that p has at least one root c. Then (x - c)|p so that p = 
(x - c) q for some q G ^[x] with deg q = n - 1 (Theorem 6.2(b)). By our 
induction hypothesis, q has at most n - 1 roots in J^, so the proof will be 
finished if we can show that p has no roots in ^ other than c and the roots of 
q. Suppose that b G ^ is such that p(b) = (b - c)q(b) = 0. Since the field f can 
have no zero divisors (Exercise 1.5.12), it must be true that either b - c = or 
q(b) = 0. In other words, if p(b) = 0, then either b = c or else b is a root of q. I 

Corollary Every polynomial p of degree n > 1 over an algebraically closed 
field ^ has n roots in ^. 

Proof While this was proved in Theorem 6.13, we repeat it here in a slightly 
different manner. As was done in the proof of Theorem 6.14, we proceed by 
induction on the degree of p. The case n = 1 is true as above, so we assume 
that n > 1. Since ^ is algebraically closed, there exists at least one root c G ^ 
such that p = (x - c)q where deg q = n - 1. By our induction hypothesis, q has 
n - 1 roots in y which are also clearly roots of p. It therefore follows that p 
has at least n - 1 + 1 = n roots in !f, while Theorem 6.14 shows that p has at 
most n roots in y. Therefore p must have exactly n roots in J^. I 

While we proved in Theorem 6.12 that the field C is algebraically closed, 
it is not true that IR is algebraically closed. This should be obvious because 

any quadratic equation of the form ax^ + bx + c = has solutions given by the 
quadratic formula 

-b ± yjb^ - Aac 
la 

and if b - 4ac < 0, then there is no solution for x in the real number system. 
(Recall that the quadratic formula follows by writing 
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= + bx/a + c/a = (x + b/2a)2 - h^/4a^ + c/a 

and solving for x.) However, in the case of IR[x], we do have the following 
result. 

Theorem 6.15 Suppose f = ao + aix + • • • + anx" G IR[x]. If a G C is a root of 
f, then so is a*. Furthermore, ifa^ a*, then (x - a)(x - a*) is a factor of f. 

Proof If a G C is a root of f, then ao + aja + • • • + ana" = 0. Taking the com- 
plex conjugate of this equation and remembering that each ai G IR, we obtain 

ao + aia* + • • • + ana*" = so that a* is also a root of f. The second part of 
the theorem now follows directly from the factor theorem. I 

Corollary Every prime polynomial in IR[x] is (up to a unit factor) either of 
the form x-aorx^ + ax + b where a, b G R and a^ - 4b < 0. 

Proof Let f G IR[x] be prime, and let a G C be a root of f (that a exists fol- 
lows from Theorem 6.13). Then x - a is a factor of f so that if a G R, then f = 
c(x - a) where c G R (since f is prime). But if a ^ R, then a G C and a* ^ a 
so that by Theorem 6. 15, f has the factor 

(x - a)(x - a*) = x^ - (a + a*)x + aa* . 
Writing a = u + /v we see that 

-a = a + a* = 2u G R 

and 

b = aa* = u^ + v^ G R 
so that f has the form (up to a unit factor) x^ + ax + b. Finally, note that 
a2 - 4b = 4u2 - 4(u2 + v^) = -4y^ < . ■ 

Exercises 

1. Suppose x" + an-ix°"^ + • • • + aiX + ao G C[x] has zeros ai, . . . , an. Prove 
that ao = +ai • • • an and an-i = -(ai + • • • + an). 

2. Let Vn C ^[x] denote the set of all polynomials of degree < n, and let ao, 
ai, . . . , an G ^ be distinct. 
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(a) Show that Vn is a vector space over J with basis {1, x, x^, . . . , x°}, 
and hence that dim Vn = n + 1. 

(b) For each i = 0, . . . , n, define the mapping Tji Vn -^7 by Ti(f ) = f(ai). 
Show that the Tj are linear functionals on Vn , i.e., that Tj £ Vn*. 

(c) For each k = 0, . . . , n define the polynomial 

„ /'v^- (■^-«o)---(^-%-i)(-^-«-t+i)---(^-«J 



^ X- Q; ^ 



Show that Ti(pj) = 8ij. 

(d) Show that po, . . . , Pn forms a basis for Vn, and hence that any f £ Vn 
may be written as 

(e) Now let bo, b,, . . . , bn G ^ be arbitrary, and define f = SbiPi . Show 
that f(aj) = bj for < j < n. Thus there exists a polynomial of degree < n 
that takes on given values at n + 1 distinct points. 

(f ) Now assume that f , g E ^[x] are of degree < n and satisfy f(aj) = bj = 
g(aj) for < j < n. Prove that f = g, and hence that the polynomial defined 
in part (e) is unique. This is called the Lagrange interpolation formula. 

3. Suppose Q C MjCC) is the set of all complex matrices of the form 




(a) Prove that Q is a division ring (i.e., that the nonzero elements of Q 
form a multiplicative group). Q is called the ring of quaternions. 

(b) Prove that Q is not a field. 

(c) Prove that x^ + 1 £ Q[x] has infinitely many roots in Q (where 1 
denotes the unit element of Q, i.e., the 2 x 2 identity matrix). 

4. Prove that f , g G C[x] are relatively prime if and only if they have no root 
in common. 



5. Let D be the differentiation operator defined in Problem 6.1.4, and sup- 
pose f G C[x] is a monic polynomial. Prove that f = (x - • • • (x - an) 



6.4 POLYNOMIALS OVER ALGEBRAICALLY CLOSED FIELDS 



281 



where ai, . . . , an ^ C are distinct if and only if f and Df are relatively 
prime. 

6. If f G ^[x] has a root a, and f = (x - a) ™g where g(a) 0, then a is said 
to be a root of multiplicity m. In other words, m is the largest integer such 
that (x - a)™|f. Let a be a root of f G ^[x] and assume that deg f > 1. 
Show that the multiplicity of f is > 1 if and only if Df(a) = 0, and hence 
that the multiplicity of a is 1 if Df(a) ^ 0. (See Problem 6.1.4 for the defi- 
nition of Df.) 

7. Show that the following polynomials have no multiple roots in C (see the 
previous problem for the definition of multiplicity): 

(a) x^ + X. 

(b) x5-5x+ 1. 

(c) x^ + bx + c where b, c G C and b^ - 4c 0. 
6.5 THE FIELD OF QUOTffiNTS 

What we will do in this section is show how to construct a field out of the ring 
^[x]. Rather than talk about polynomials specifically, we use the fact that 
^[x] is an integral domain and treat the problem on a more general footing. 

Notice that the set Z of all integers has the property that if ab = for some 
a, b G Z, then either a or b must equal 0. Since Z is a ring, this shows that Z is 
in fact an integral domain. Also note though, for any a G Z, a 1, we have 
a"' = 1/a ^ Z, so that Z is not a field. However, if we enlarge the set Z to 
include all of the rational numbers, then we do indeed obtain a field. What this 
really entails is taking all pairs a, b G Z and forming the object a/b with the 
appropriate algebraic operations defined on it. In this particular case, we say 
that a/b = c/d if and only if ad = be, and we define the operations of addition 
and multiplication by 

a/b + c/d = (ad + bc)/bd 

and 

(a/b)(c/d) = (ac)/(bd) . 

In order to generalize this result, we make the following definition. Let D 
be an integral domain (i.e., a commutative ring with no zero divisors), let D' 
denote the set of all nonzero elements of D, and let Q. be the set of all ordered 
pairs 

Q = {(a,b)GDxD'} . 
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(You may think of (a, b) as the quotient a/b.) We define a relation ~ on Q, by 
(a, b) ~ (c, d) if ad = be, and we claim that this is an equivalence relation (for 
example, 2/3 is "equivalent" to 8/12). To prove this, we must verify the three 
requirements given in Section 0.3. First, for any (a, b) G Q we have (a, b) ~ 
(a, b) since ab = ba. Next, for any (a, b), (c, d) G Q. we see that (a, b) ~ (c, d) 
implies ad = be, and hence cb = da which thus implies (c, d) ~ (a, b). Finally, 
suppose (a, b), (c, d), (e, f ) G Q, where (a, b) ~ (c, d) and (c, d) ~ (e, f ). Then 
ad = be and cf = de, and therefore bde = bcf = adf. But D is commutative, and 
hence this is just afd = bed. By assumption d so that, since D is an in- 
tegral domain, we must have af = be and thus (a, b) ~ (e, f ). 

We are now in a position to show that any integral domain can be enlarged 
in a similar manner to form a field. By way of terminology, if there is a one- 
to-one homomorphism (i.e., an isomorphism) of a ring R into a ring R', then 
we say that R can be embedded in R'. Furthermore, if R and R' are both rings 
with unit elements 1 and 1' respectively, then we require that the embedding 
take 1 into 1'. The ring R' is also called an extension of R. 

The proof of the next theorem appears to be quite involved, but it is actu- 
ally nothing more than a long series of simple steps. 

Theorem 6.16 Every integral domain D can be embedded in a field. 

Proof Let D' be the nonzero elements of D, and let Q be the set of all ordered 
pairs (a, b) G D x D' as defined above. We let [a, b] denote the equivalence 
class in Q, of (a, b) as constructed above. In other words, 

[a,b] = {(x,y)eDxD':(a,b)~(x,y)} . 

We claim that the set of all such equivalence classes forms a field. To 
prove this, we must first define addition and multiplication in f d- 
Guided by the properties of Z, we define addition in by the rule 

[a, b] + [c, d] = [ad + be, bd] . 

Since D is an integral domain, we know that hA^O for any nonzero b, d G D, 
and hence [ad + be, bd] G We still must show that this addition is well- 
defined, i.e., if [a, b] = [a', b'] and [c, d] = [c', d'], then 

[a,b] + [c,d] = [a',b'] + [c',d'] . 

This is equivalent to showing that 
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[ad + bc,bd] = [a'd' + b'c', b'd'] 

or, alternatively, that 

(ad + bc)b'd' = bd(a'd' + b'c') . 

From [a, b] = [a', b'] we have ab' = ba', and similarly cd' = dc'. Therefore we 
indeed have 

(ad + bc)b'd' = adb'd' + bcb'd' = ab'dd' + bb'cd' = ba'dd' + bb'dc' 
= bda'd' + bdb'c' = bd{a'd' + b'c') . 

Since D is commutative, it should be clear that if c ;t then 

[a, b] = [ac, be] = [ca, cb] . 

Therefore 

[a, b] + [0, c] = [ac + bO, be] = [ac, be] = [a, b] 

so that [0, c] is a zero element for addition. We now see that 

[a, b] + [-a, b] = [ab - ba, bb] = [0, b] 

and hence [-a, b] is the negative of [a, b]. The reader should now have no 
trouble showing that is an abelian group under addition. 

To complete our ring structure, we define multiplication in by the rule 

[a, b][c, d] = [ac, bd] . 

As was the case with addition, the fact that h, d ^0 means that bd ^ 0, and 
hence the product is an element of ^d- We leave it to the reader to show that 
the product is well-defined (see Exercise 6.5.1). It is also easy to see that 
[x, x] is a unit element in for any nonzero x £ D, and that the nonzero 
elements of (i-C-. those of the form [a, b] with a^O) form an abelian group 
under multiplication with the inverse of an element [a, b] given by [a, b]"' = 
[b, a] (where [b, a] G since a j!^ 0). 

As to the ring axioms, we show only one of the distributive laws, and 
leave the others to the reader. This will complete the proof that forms a 
field. If [a, b], [c, d], [e, f\ e Jd, then 
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[a, b]([c, d] + [e, /]) = [a, b][cf + de, df] = [acf + ade, bdf] 

= [b(acf + ade), bbdf] = [(ac)(bf) + {bd){ae), (bd)(bf)] 
= [ac, bd] + [ae, bf] = [a, b][c, d] + [a, b][e, /] 

What we have accomplished up to this point is the construction of a field 
from an arbitrary integral domain D. It still must be shown that D can be 
embedded in ^d- As was noted above, for any nonzero x, y G D we have 
[ax, x] = [ay, y] because (ax)y = x(ay). This means that we can denote the ele- 
ment [ax, x] G by [a, 1]. (It is important to realize that the ring D does not 
necessarily contain a unit element, so that there need not exist a unit element 
1 G D. What we have just done is define the element [a, 1] G ^q. Everything 
that follows in the remainder of this proof holds if we replace the symbol 1 by 
an arbitrary nonzero element x G D.) 

We now define the mapping (j): D — >^ d by (j)(a) = [a, 1] for all a G D. If 
(t)(a) = (t)(a'), then [a, 1] = [a', 1] so that al = la' and hence a = a', thus proving 
that (|) is one-to-one. (As we just mentioned, the symbol 1 could actually be 
replaced by any x ?t since the fact that D is an integral domain then says that 
if ax = xa', then x(a - a') = which also implies that a = a'.) To finish the 
proof, we need only show that (j) is a homomorphism. But for any a, b G D we 
have 

(|)(a + b) = [a + b, 1] = [al+bl,M] = [a, 1] + [b, 1] = (|)(a) + (|)(b) 

and 

(t)(ab) = [ab, 1] = [ab, M] = [a, l][b, 1] = mm ■ ■ 

The field constructed in this theorem is usually called the field of quo- 
tients of D. If we start with the ring of integers Z, then this construction yields 
the rational field Q. While we have shown that any integral domain D can be 

embedded in its field of quotients, there can be other fields in which D can 
also be embedded. However, it can be shown that JFd is the "smallest" field in 
which D can be embedded (see Exercise 6.5.2). 

Exercises 

1. Referring to the proof of Theorem 6.16, show that the product in is 
well-defined. 

2. Show that is the smallest field in which an integral domain D can be 
embedded. In other words, show that if is any field containing an inte- 
gral domain isomorphic to D, then contains a field isomorphic to • 
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[Hint: For simplicity, assume that D is actually a subring of • Now, for 
any a, b £ D show that the map ^: defined by ^([a, b]) = ab"' is 

one-to-one and preserves addition and multiplication. Thus (j) is an iso- 
morphism of onto a subfield of !^ .] 

3. Show that !F d obeys all of the axioms for a ring. 
6.6 POLYNOMIALS OVER FINITE FIELDS * 

With very few exceptions (e.g., Exercise 1.5.15), the fields we have been 
using (such as IR and C) contain an infinite number of elements. However, it is 
also possible to construct many fields that contain only a finite number of ele- 
ments. This section is meant to be only an introduction to the theory of finite 
fields. 

Recall from Example 1.11 that two integers a, b G Z are said to be 
congruent modulo n (where n E Z ) if n|(a - b), and we write this as 

a = b(mod n) . 

We also saw in Exercise 1.5.2 that for each n G Z"^, this defines an 
equivalence relation on Z that decomposes Z into n distinct congruence 
classes. We shall denote the congruence class (for a fixed n) of an integer k G 
Z by [k]. If there is any possible ambiguity as to the value of n under 
discussion, we will write [k]n. For example, if n = 5 we have 

[2] = [7] = [-33] = {..., -8, -3, 2, 7, 12, . . .} . 

We also refer to any of the integers in [k] as a representative of [k]. For 
example, 2 is the smallest positive representative of the class [7] (or the class 
[-33] etc.). We emphasize that [k] is a subset of Z for each k. 
From Example 1.11, we say that the collection 

{[0],[1], [2],...,[n-l]} 

forms a complete set of congruence classes modulo n, and we denote this 
collection by Zn. We first show that Zn can be made into an abelian group. 
For any [a], [b] G Zn we define a group "addition" operation [a] © [b] G Zn 
by 

[a] © [b] = [a + b] . 
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(Note that the symbol © is used in an entirely different context when we talk 
about direct sums.) It should be obvious that [0] will serve as the additive 
identity element since [a] © [0] = [a + 0] = [a] and [0] © [a] = [0 + a] = [a]. 
Noting that, for example with n = 5 again, we have [3] = [18] and [4] = [-1], 
we must be sure that [3] © [4] = [18] © [-1]. Clearly this is true because [3] © 
[4] = [7] = [2] and [18] © [-1] = [17] = [2]. In other words, we must be sure 
that this addition operation is well-defined. That this is in fact the case is 
included in the next theorem. 

Theorem 6.17 The set Zn is an abelian group with respect to the operation © 
defined above. 

Proof We leave it to the reader to show that © is indeed well-defined (see 
Exercise 6.6.1). As to the group properties, we prove associativity, leaving the 
rest of the proof to the reader (see Exercise 6.6.2). We have 

[a] © {[b] © [c]) = [a] @[b + c] = [a + {b + c)] = [{a + b) + c] 
= [a + b]@[c] = {[a\@m)@{c] . I 

By virtue of this theorem, Zn is called the group of integers modulo n (or 

simply mod n). We can also define another operation on Zn that is analogous 
to multiplication. Thus, we define the "multiplication" operation ® on Zn by 

[a] ® [b] = [ab] . 

(Again, this symbol should not be confused with the tensor product to be 
introduced in Chapter 11.) For example, if n = 6 we have [2] ® [5] = [10] = 
[4] and [3] [-4] = [-12] = [0]. The closest analogue to Theorem 6.17 that 
we have for ® is the following. 

Theorem 6.18 The operation ® defined above on Zn is well-defined, obeys 
the associative and commutative laws, and has [1] as the identity element. 

Proof See Exercise 6.6.3. I 

Since [1] is the identity element for ®, it is easy to see that [0] has no mul- 
tiplicative inverse in Zn, and hence Zn can not possibly form a group under ®. 
Let us denote the set Zn - [0] = {[1], [2], . . . , [n - 1]} by Zn"^. It turns out 
that for some (but not all) values of n, Zn"^ will in fact form a group with 
respect to ®. We will leave specific examples of this to the exercises at the 
end of this section. 
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With the operations ® and ® defined, it is now easy to see that Zn actually 
forms a commutative ring. All we must do is verify the axioms given in 
Section 1.4. We will show that the first half of axiom (R8) is obeyed, and 
leave it to the reader to verify the rest of the ring axioms (see Exercise 6.6.4). 
We therefore have 

[a] ® ([b] ® [c]) = [a] ®[b + c] = [a(b + c)] = [ab + ac] 

= [ab]®[ac] = ([a]®lb])®([a]®[c]) . 

Now consider the ring Zn and assume that n is not prime. Then we may 
write n = rs where r, s > 1. But then [r] ® [s] = [rs] = [n] = [0] where [r], [s] ^ 
[0]. Since [0] is the zero element of Zn, this shows that Zn is not an integral 
domain if n is not prime. On the other hand, suppose that n = p is prime. We 
claim that Zp is an integral domain. For, suppose [a] G Zp and [a] ^ [0]. We 
may assume that a is the smallest positive representative of the equivalence 
class [a], and hence a < p. Now assume that [b] G Zp is such that [a] ® [b] = 
[ab] = [0] (where we again choose b < p to be the smallest positive representa- 
tive of the class [b]). Then by definition we have p|ab. But p is prime so (by 
Theorem 0.9) this implies that either p|a or p|b. Since a < p, it is impossible for 
p to divide a, and therefore p|b. Since < b < p, we must have b = 0, and thus 
Zp is an integral domain. This proves the next result. 

Theorem 6.19 The ring Zn is an integral domain if and only if n is prime. 

Noting that Zn consists of n equivalence classes, we now claim that Zn is 
in fact a field if n is prime. This is an immediate consequence of the following 
general result. Recall that a field is a commutative ring with identity element 
in which the nonzero elements form a multiplicative group (i.e., a commuta- 
tive division ring). Furthermore, any field is necessarily an integral domain 
(see Exercise 1.5.6 or 1.5.12). 

Theorem 6.20 Every finite integral domain is a field. 

Proof Let D be a finite integral domain (which is commutative by 
definition). We must show that 1 G D, and that every nonzero a G D has a 
multiplicative inverse that is also in D. In other words, we must show that for 
every nonzero a G D there exists b G D such that ab = 1 G D. Let {x,, . . . , 
Xn} denote all the elements of D, and consider the set {ax,, . . . , axn} where 
a G D and a 0. If axj = axj for i^j, then a(Xi - Xj) = which (since D has no 
zero divisors) implies that Xj = Xj, contradicting the assumption that i j. Thus 
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all distinct. Since D contains n elements, it follows that in 
fact we have D = {axi, . . . , axn}. In other words, every y £ D can be written 
in the form aXi = Xia for some i = 1, . . . , n. In particular, we must have a = 
axio for some io = 1, . . . , n. Then for any y = Xia E D we have 



so that Xig may be taken as the identity element 1 in D. Finally, since we have 
now shown that 1 E D, it follows that 1 = axj for some particular j = 1, . . . , n. 
Defining b = Xj yields 1 = ab and completes the proof. I 

Corollary Zn is a field if and only if n is prime. 

We now turn our attention to the question of whether there exist any finite 
fields that do not contain a prime number of elements. As with groups, we 
refer to the number of elements in a finite field as its order. This is not sur- 
prising since any field is a ring, and any ring is an additive group. We will 
frequently denote a finite field by F rather than by J^. 

Example 6.10 Let SzCT) C MjCF) denote the set of all matrices of the form 



We will show that 82(1^) is a field when ^ = Z3 but not when ^ = Z5 . 

We leave it as a simple exercise for the reader to show that if A, B E 
S2CO, then A + B and AB are also in S2CF). Furthermore, AB = BA so that 
82(10 is commutative. Note that 8200 also contains the zero and identity 
matrices, and if A E S2{!F), then so is -A. Thus 82(10 is easily seen to be a 
subring of M.2{!F). We now consider the problem of inverses. If 



yxio = (x,a)xio = 



Xi(axio) = Xia = y 





has inverse 




then AA ' = I yields the two simultaneous equations 
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ax-by = l 
ay + bx = . 

Since x, y, a, b £ ^ we can (formally) solve these for a and b to obtain 

b = -y(x^ +y'^y^ . 

The element (x^ + y^)"^ will exist as long as x^ + y^ ^ 0. 

In Z3 we have x, y = {0, 1, 2} and it is easy to see by direct calculation 

that x^ + y^ as long as (x, y) ^ (0, 0). For example, if x = 1 and y = 2 we 
have x^ + y^ = 1 + 1 = 2. Thus every nonzero matrix in 82(23) is invertible, 
and hence 82(23) is a field with 9 elements. 

On the other hand, in Z 5 we see that 1^ + 2^ = so that the matrix with x = 
1 and y = 2 is not invertible, and hence 82(Z5) is not a field. / 

Example 6.11 Since Z3 is a field, we can consider polynomials in Z3[x]. 
These may be used to generate a field of order 9 as follows. We define the set 
F9 C Z3[x] consisting of the nine polynomials of degree < 1 with coefficients 
inZ3: 

F9 = {0, 1, 2, X, X + 1, X + 2, 2x, 2x + 1, 2x + 2} . 

It is easy to see that this set is closed under ordinary polynomial addition. For 
example, remembering that our scalars lie in Z3 we have (x + 1) + (2x + 1) = 
2. However, we must be careful in defining multiplication. This is because, for 
example, (x + l)(2x + 1) = 2x^ + 1 ^ F9 even though it is in Z3[x]. To ensure 
that multiplication is closed, we multiply as usual in Z3[x] and then reduce 

modulo X + 1. In other words, we subtract off multiples of x + 1. For exam- 
ple, we have 

(x + l)(2x + 1) = 2x2 + 1 (in Z3 [x]) 

= 2(^2 + 1) + 2 

= 2 (inF^) . 

As another example. 



(2x + l)(x) 



= 2x2 + X 

= 2(x2+l) + (x + l) 
= x + l 



(in Z,[x]) 
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Using the constant polynomials and 1 as the and 1 elements of a ring, it is 
easy to show that F9 forms a commutative ring. That F9 in fact is a field fol- 
lows from the observation that each nonzero element of F9 has the inverse 
shown below: 

Element: 12 x x+1 x+2 2x 2x+l 2x+2 
Inverse: 1 2 2x x+2 x+1 x 2x+2 2x+l 

We leave verification of these facts to the reader. / 

Let G be any group, and let e be the identity element of G. If there exists 
an element x G G such that every element in G is a power of x, then G is said 
to be a cyclic group generated by x. The cyclic group generated by x is usu- 
ally denoted by {x>. If we consider the set of all powers of the generator x, 
then this set will consist of either all distinct elements, or else some elements 
will be repeated. In the case of repeated elements, there will exist a positive 
integer m > such that x™ = e while no smaller nonzero power of x is also 
equal to e. For example, if i < j and x' = xJ, we have e = x'(x^)"' = xJ x"^ = x-'~\ 
Then let m be the smallest nonzero value of all such differences j - i. 

We say that G = {e, x, x^, . . . , x™ is a cyclic group of order m, and 
we denote this group by Cm- We are using the letter e to denote the identity 
element of a general group so that we may distinguish between the multi- 
plicative identity (usually written as 1) of some groups and the additive 
identity (usually written as 0) of other groups. Note that given any k G Z we 
may write k = qm + r (where < r < m), and hence 

X'^ = = Q'ix' = . 

Thus all powers of x can indeed be written in terms of the first m powers of x. 
In the case that all powers of x are distinct, then the group 

G = { . . . , x-2, x-^ e, X, X 2, ... } 

contains an infinite number of elements. We denote such an infinite cyclic 
group by Coo. Of importance to us is the fact that many apparently diverse 
groups are isomorphic to cyclic groups (either finite or infinite). We will see a 
simple example of this below. 

If X G G and m > is the smallest integer such that x"^ = 1, then m is 
called the order of x, and will also be denoted by o(x). (This should not be 
confused with the order of G which is the number of elements in G. It is for 
this reason that o(G) is frequently denoted by |G|.) If o(x) < o(G), then it is 
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easy to see that {x> is simply a subgroup of G, and furthermore that o(x) = 
o({x» (see Exercise 6.6.9). 

Example 6.12 Let us show that the set of integers Z under addition is iso- 
morphic to Coo = (x). We define the mapping (j): Z Coo by (j)(n) = x". Clearly 
4) is a homomorphism of groups since 

(|)(n + m) = x°+"^ = x^x"^ = (|)(n)(|)(m) . 

(Note that n + m in <^(n + m) denotes the "product" of two group elements in 
Z while (l)(n){|)(m) denotes the product of two elements in Coo.) It should also 

be obvious that (j) is surjective because every x*^ G Coo is just the image of k G 
Z. Finally, (j) is injective since Ker (j) = {0}. We have thus constructed an iso- 
morphism of Z onto Coo. 

We leave it to the reader to show that Zm is isomorphic to Cm (see 
Exercise 6.6.7). / 

Let ^ be any field. Since ^ is closed under addition and contains the mul- 
tiplicative identity element 1, we see that for any n G Z"^, the sum 1 + . . . + 1 
of n 1 's is also in ^. For example, 2=1 + 1G^ as is 3 = 1 + 1 + 1 and so on. 
(However, it is important to stress that in an arbitrary field these elements 
need not be distinct.) Therefore the positive integers form a (not necessarily 
infinite) cyclic subgroup {1> of the additive group of ^. In other words 

{1> = {0,1,2,...} 

where each n G ( 1 ) denotes 1 + . . . + 1 (n times) in ^. 

Now consider the special case where F is a finite field. Note that since (1) 
is a subgroup of F, Theorem 1.9 tells us that o((l))|o(F). The number o((l)) is 
called the characteristic of F (see Section 1.5). For example, if F = Zp, then 
(1) = {0, 1, 2, . . . , p - 1} = Zp and hence the characteristic of Zp is o({l» = 
p = o(F). The characteristic of an infinite field may or may not be finite. 

Example 6.13 Let us show that the characteristic of any finite field must be a 
prime number (if it is nonzero). By definition, the characteristic of F is the 

smallest m G Z such that m- 1 = 1 + • • • + 1 = 0. If m ;t is not prime, then 
we may write m = rs for some integers < r < m and < s < m. But then we 
have rs = which implies (since we are in a field) that either r = or s = 0. In 

either case this contradicts the definition of m as the least positive integer such 
that m = 0. Thus m must be prime. Note that this proof actually applies to any 
finite integral domain with an identity element. / 
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We now wish to prove that if a finite field exists, then its order must be a 
prime power. For example, we have seen that Zn is a field if and only if n is 

prime, and the field F9 discussed above is of order 9 = 3 . Our claim will fol- 
low as an immediate corollary of the next theorem. The reader should recall 
that the direct product of two groups was defined in Exercise 1.1.5. The direct 
product of a finite number of groups follows by an obvious induction argu- 
ment. 

Example 6.14 Consider the cyclic groups C2 = {x> = {1, x} and C3 = {y> = 
{1, y, y^}. Then the product C2 x C3 consists of the six elements 

C2 X C3 = {(1, 1), (1, y), (1, y\ (X, 1), (X, y), (x, y^)} . 

To show that this product group is isomorphic to Ce, let z = (x, y) G C2 x C3 . 
Then by definition of the group product in C2 x C3 we have z = (1, y ), z = 
(X, 1), z4 = (1, y), z5 = (X, y\ z^ = (1, 1). Therefore C,xC^ = {z, z^, . . . , z^} 
which is just a cyclic group of order 6. / 

Theorem 6.21 If F is a finite field of characteristic p, then for some r > 1, the 
additive group of F is isomorphic to the r-fold direct product (Cp)'. Thus 
o(F) = p'. 

Proof We leave it to the reader to show that Zp is isomorphic to a subfield of 
F, and hence that F may be considered to be a vector space V over the field 
Zp. Since F is finite, r = dim V must also be finite. By the corollary to 

Theorem 2.8, V is isomorphic to (Zp)"^ = Zp x • • • x Zp . Noting that Zp = 
{0, 1, . . . , p - 1} = (1) is isomorphic to the (additive) cyclic group Cp, it fol- 
lows that V is isomorphic to (Cp)"^ (i.e., the set of all r-tuples of field ele- 
ments). Thus V has p' elements. I 

Corollary The order of a finite field must be a prime power. 

Proof This follows from Example 6. 13 and Theorem 6.21. I 

We now comment briefly on the construction of finite fields. What we 
shall do is generalize the procedure demonstrated in Example 6.11 where we 
constructed the field F9. Recall that the problem came in defining a closed 
multiplication in Z3[x]. A more general way to view the solution to this 
problem is to define an equivalence relation = on Z3[x] by the requirement 
that a ~ b if (x^ + l)|(a - b). Note that x^ + 1 is prime in Z3[x]. We may now 



6.6 POLYNOMIALS OVER FINITE FIELDS 



293 



take the elements of F9 to be the equivalence classes of the nine polynomials 
that were previously used in Example 6.11 to define F9. Another way to say 
this is that these nine polynomials of degree < 1 form a complete set of repre- 
sentatives of the classes. In this case, addition and multiplication are defined 
as expected by 

[a] + [b] = [a + b] 

and 

[a][b] = [ab] . 

Note that if (x'^ + l)|(a - b), then the remainder of a divided by x^ + 1 must 
be the same as the remainder of b divided by x + 1. Since there are only a 
finite number of polynomials in Z3[x], there can be only a finite number of 
distinct remainders, and the degree of each remainder must be less than that of 
X + 1. Referring to Theorem 6.7 and its proof, a moments thought should 
convince you that all we are doing is considering the cosets Z3[x]/(x^ + 1) 
where (x +1) denotes the principal ideal generated by k = x + 1 E Z3[x]. 
This is because any p G Z3[x]/(k) = Z3[x]/I is of the form I + h where h E 
Z3[x]. But h = qk + r for some q E Z3[x] and where deg r < deg k, and hence 
qk E I. Therefore p must actually be of the form I + r, and thus there can be 
only as many distinct such p as there are distinct r. 

The next theorem shows that this approach works in general. 

Theorem 6.22 Let k E Zp[x] be a prime polynomial of degree r, and define 
an equivalence relation on Zp[x] by a ~ b if and only if k|(a - b). Then the cor- 
responding set of equivalence classes in Zp[x] is a field of order p'. 

Proof First note that if ai E Zp[x], then there are p' distinct polynomials of 

the form ao + ajX + • • • + ar-ix'"^ This set of p' polynomials (which consists 
of all distinct polynomials of degrees 0, 1, 2, . . . , r - 1) forms a complete set 

of representatives of the classes, and hence there are p"^ classes in all. Since 
these equivalence classes are just the cosets Zp[x]/(k) where k is prime, it 
follows from Theorem 6. 1 1 that Zp[x]/(k) is a field. I 

One consequence of this theorem is that to construct a field of order p'^ we 
need only find a prime polynomial of degree r in Zp[x]. While this is easy 
enough to do in most common cases, it is fairly hard to prove that there exists 
at least one prime polynomial for every choice of p and r. We refer the inter- 
ested reader to e.g., the very readable book by Biggs (1985). 
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Exercises 

1. Show that the operation © defined on Zn is well-defined. In other words, 
show that if [ai] = [3.2] and [bi] = [bz] then [ai + bi] = [aj + bz]. 

2. Finish the proof that Zn forms an additive group. 

3. Prove Theorem 6.18. 

4. Finish the proof that Zn forms a ring. 

5. Finish the details in Example 6. 10. 

6. Finish the details in Example 6. 1 1 . 

7. Prove that Zm is isomorphic to Cm- 

8. If m and n are relatively prime positive integers, prove that Cm x Cn is 
isomorphic to Cmn- [Hint: Suppose Cm = {x> and Cn = {y>. Let z = 
(x, y) G Cm X Cn have order r. Show that r = mn and then conclude that 
Cm X Cn must be a cyclic group.] 

9. Let G be a group, and suppose (x) C G. If o(x) < o(G), show that (x) is a 
subgroup of G and that o(x) = o((x». 

10. Fill in the details in the proof of Theorem 6.22. 

11. For each of the following expressions in Z5, write the answer as [0], [1], 
[2], [3] or [4]: 

(a) [3] ©[4] 

(b) [2] © [-7] 

(c) [17] © [76] 

(d) [3] ® [4] 

(e) [2] ® [-7] 

(f) [17]® [76] 

(g) [3]® ([2] ©[4]) 

(h) ([3]® [2]) ©([3]® [4]) 

12. Repeat the previous problem in Ze. 

13. (a) Which elements of Z4 are zero divisors? 
(b) Which elements of Zio are zero divisors? 
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14. Show that ([2], [0]) is a zero divisor in Z3 x Z3. 

15. (a) Show that 1 + x + x G Z2[x] is prime over Zj. 
(b) Show that 1 + x G Z3[x] is prime over Z3. 
[Hint: Use the factor theorem.] 

16. (a) Show that 1 + + G ZjLx] is prime over Z2, and use this to con- 
struct a field of order 8. 

(b) What is the order of its multiplicative group? 

17. Prove that for every prime number p there exist fields of order p^ and p'^. 

18. For which of the following primes p can we construct a field of order p 
by using the polynomial 1 + x ? 

p = 3,5,7, 11, 13, 19,23 . 

Describe the multiplicative group for the first two cases in which the field 
can be constructed. 



CHAPTER 7 



Linear Transformations 
and Polynomials 



We now turn our attention to the problem of finding the basis in which a given 
linear transformation has the simplest possible representation. Such a repre- 
sentation is frequently called a canonical form. Although we would almost 

always like to find a basis in which the matrix representation of an operator is 
diagonal, this is in general impossible to do. Basically, in this chapter as well 
as in Chapters 8 and 10, we will try and find the general conditions that 
determine exactly what form it is possible for a representation to take. 

In the present chapter, we focus our attention on eigenvalues and eigen- 
vectors, which is probably the most important characterization of a linear 
operator that is available to us. We also treat the triangular form theorem from 
two distinct viewpoints. Our reason for this is that in this chapter we discuss 
both quotient spaces and nilpotent transformations, and the triangular form 
theorem is a good application of these ideas. However, since we also treat this 
theorem from an entirely different (and much simpler) point of view in the 
next chapter, the reader should feel free to skip Sections 7.10 to 7.12 if 
desired. (We also point out that Section 7.9 on quotient spaces is completely 
independent of the rest of this chapter, and may in fact be read immediately 
after Chapter 2.) 

In Chapter 8 we give a complete discussion of canonical forms of matrices 
under similarity. All of the results that we prove in the present chapter for 
canonical forms of operators also follow from the development in Chapter 8. 
The reason for treating the "operator point of view" as well as the "matrix 
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point of view" is that the proof techniques and way of thinking can be quite 
different. The matrix point of view leads to more constructive and insightful 
proofs, while the operator point of view leads to techniques that are more 
likely to extend to infinite-dimensional analogs (although there is no complete 
extension to the infinite-dimensional version). 

7.1 MINIMAL POLYNOMIALS 

Letf = ao + ajX + • • • + anx'^ G !F[^] be any polynomial in the indeterminate x. 
Then, given any linear operator T G L(V), we define the linear operator f(T) E 
L(V) as the polynomial in the operator T defined by substitution as 

f(T) = aol + aiT + • • • + anT° 

where 1 is the identity transformation on V. Similarly, given any matrix A G 
MmClTO! we define the matrix polynomial f(A) by 

f(A) = aol + aiA + • • • + anA" 

where now I is the m x m identity matrix. If T is such that f(T) = 0, then we 
say that T is a root or zero of the polynomial f. This terminology also applies 

to a matrix A such that f(A) = 0. 

If A G Min(jF) is the representation of T G L(V) relative to some (ordered) 
basis for V, then (in view of Theorem 5.13) we expect that f(A) is the repre- 
sentation of f(T). This is indeed the case. 

Theorem 7.1 Let A be the matrix representation of an operator T G L(V). 
Then f(A) is the representation of f(T) for any polynomial f G ^[x]. 

Proof This is Exercise 7.1.1. I 

The basic algebraic properties of polynomials in either operators or 
matrices are given by the following theorem. 

Theorem 7.2 Suppose T G L(V) and let f , g G jF[x]. Then 

(a) f(T)T = Tf(T). 

(b) (f±g)(T) = f(T)±g(T). 

(c) (fg)(T) = f(T)g(T). 

(d) (cf)(T) = cf(T) for any cG J. 
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Furthermore, these same results also hold for any matrix representation A £ 

M„en. 

Proof In view of Theorem 6.1, we leave this as an easy exercise for the 
reader (see Exercise 7. 1.2). I 

From this theorem and the fact that the ring of polynomials is commuta- 
tive, it should be clear that any two polynomials in the operator T (or matrix 
A) also commute. 

This discussion is easily generalized as follows. Let be any algebra over 
with unit element e, and let f = ao + aiX + • • • + anx" be any polynomial in 
iF[x]. Then for any a E we define 

f(a) = aoC + aia + • • • + anw" E A . 

If f(a) = 0, then a is a root of f and we say that a satisfies f. We now show 
that in fact every a El A satisfies some nontrivial polynomial in ^[x]. Recall 
that by definition, an algebra ^ is automatically a vector space over J^. 

Theorem 7.3 Let A be an algebra (with unit element e) of dimension m over 
^. Then every element a E satisfies some nontrivial polynomial in jF[x] of 
degree at most m. 

Proof Since dim = m, it follows that for any a G j^, the m + 1 elements e, 
a, a^, . . . , a™ E j^. must be linearly dependent (Theorem 2.6). This means 
there exist scalars ao, ai, . . . , am E ^ not all equal to zero such that 

aoC + aia + • • • + amOt™ = . 

But then a satisfies the nontrivial polynomial 

f = ao + aix + • • • + amx"^ E ^[x] 

which is of degree at most m. I 

Corollary Let V be a finite-dimensional vector space over ^, and suppose 
dim V = n. Then any T E L(V) satisfies some nontrivial polynomial g G jF[x] 
of degree at most t?. 
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Proof By Theorem 5.8, L(V) is an algebra over ^, and by Theorem 5.4 we 

have dim L(V) = dim L(V, V) = t?. The corollary now follows by direct 
application of Theorem 7.3. I 

While this corollary asserts that any T G L(V) always satisfies some poly- 
nomial g G ^[x] of degree at most n^, we shall see a little later on that g can 
be chosen to have degree at most n (this is the famous Cayley-Hamilton theo- 
rem). 

Now, for a given T G L(V), consider the set of all f G ^[x] with the prop- 
erty that f(T) = 0. This set is not empty by virtue of the previous corollary. 
Hence (by well-ordering) we may choose a polynomial p G ^[x] of least 
degree with the property that p(T) = 0. Such a polynomial is called a minimal 
polynomial for T over J^. (We will present an alternative definition in terms 
of ideals in Section 7.4.) 

Theorem 7.4 Let V be finite-dimensional and suppose T G L(V). Then there 
exists a unique monic polynomial m G ^[x] such that m(T) = and, in addi- 
tion, if q G jF[x] is any other polynomial such that q(T) = 0, then m|q. 

Proof The existence of a minimal polynomial p G ^[x] was shown in the 
previous paragraph, so all that remains is to prove the uniqueness of a partic- 
ular (i.e., monic) minimal polynomial. Suppose 

p = ao + aix + • • • + anX° 

so that deg p = n. Multiplying p by an"' we obtain a monic polynomial m G 
^[x] with the property that m(T) = 0. If m' is another distinct monic polyno- 
mial of degree n with the property that m'(T) = 0, then m - m' is a nonzero 
polynomial of degree less than n (since the leading terms cancel) that is satis- 
fied by T, thus contradicting the definition of n. This proves the existence of a 
unique monic minimal polynomial. 

Now let q be another polynomial satisfied by T. Applying the division 
algorithm we have q = mg + r where either r = or deg r < deg m. 
Substituting T into this equation and using the fact that q(T) = and m(T) = 
we find that r(T) = 0. But if r ;t 0, then we would have a polynomial r with deg 
r < deg m such that r(T) = 0, contradicting the definition of m. We must 
therefore have r = so that q = mg, and hence m|q. I 

From now on, all minimal polynomials will be assumed to be monic 
unless otherwise noted. Furthermore, in Section 7.3 we will show (as a 
consequence of the Cayley-Hamilton theorem) the existence of a minimal 
polynomial for matrices. It then follows as a consequence of Theorem 7.1 that 
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any T £ L(V) and its corresponding matrix representation A both have the 
same minimal polynomial (since m(T) = if and only if m(A) = 0). 

Recall that T G L(V) is invertible if there exists an element T"' G L(V) 
such that TT"' = T"'T = 1 (where 1 is the identity element of L(V)). It is 
interesting to note that for any invertible T G L(V), its inverse T"^ is actually a 
polynomial in T. This fact is essentially shown in the proof of the next 
theorem. 

Theorem 7.5 Let V be finite-dimensional over J^. Then T G L(V) is invert- 
ible if and only if the constant term in the minimal polynomial for T is not 
equal to zero. 

Proof Let the minimal polynomial for T over ^ be 

m = ao + aiX + • • • + an-ix°"^ + x° . 
We first assume that ao * 0. Since m is the minimal polynomial for T, we have 

m(T) = aol + aj + • • • + an-iT"-> + T" = 
and hence multiplying by ao"' and using Theorem 7.2 yields 

= 1 + a^-^T^a^ \ + a2T + --- + a„_iT"-^ +7"-^) 

or 

1 = r[-ao-i (ai 1 + + • • • + a^.J"'^ + T"-^ )] 

= [-ao"' («! 1 + + • • • + a„_ir «-2 + r «-i )]r . 

This shows that T"' = -ao"'(ail + azT + • • • + an-i T"~^ + T""^), and hence T is 
invertible. 

Now suppose T is invertible, but that ao = 0. Then we have 

= 0^7 + a-^T'^ + ■■■ + a„_iT"-^ + T" 
= (ail + a2T + --- + a„_iT"-^ +T"-^)T . 

Multiplying from the right by T"' yields 

= ail + azT + • • • + an-iT"-2 + T°-i 

and hence T satisfies the polynomial p = ai + a2X + • • • + an-i x""^ + x""^ G 
!f[x]. But deg p = n - 1 < n which contradicts the definition of m as the mini- 
mal polynomial. Therefore we must have ao ^ 0. I 
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Corollary Let V be finite-dimensional over ^, and assume that T £ L(V) is 
invertible. Then T"' is a polynomial in T over J^. 

Proof If T is invertible, then m(T) = aol + aiT + • • • + an-iT""! + T" = with 
ao 9^ 0. Multiplying by ao~' then shows that 

T-i = -ao-'(ail + a2T + • • • + an-iT"-^ + T°-i) . I 

While we have so far shown the existence of minimal polynomials, most 
readers would be hard-pressed at this point to actually find one given any par- 
ticular linear operator. Fortunately, we will discover a fairly general method 
for finding the minimal polynomial of a matrix in Chapter 8 (see Theorem 
8.10). 

As we stated earlier, V will always denote a finite-dimensional vector 
space over a field J^. In addition, we will let 1 £ L(V) denote the identity 
transformation on V (i.e., the unit element of L(V)), and we let I £ MnClT) be 
the identity matrix. 

Exercises 

1 . Prove Theorem 7.1. 

2. Prove Theorem 7.2. 

3. Let V be finite-dimensional over ^, and suppose T £ L(V) is singular. 
Prove there exists a nonzero S £ L(V) such that ST = TS = 0. 

4. Suppose V has a basis {ei, Q2}. If T G L(V), then Tci = Sjejaji for some 
(aij) E M2CI70- Find a nonzero polynomial of degree 2 in ^[x] satisfied by 
T. 

5. Repeat the previous problem, but let dim V = 3 and find a polynomial of 
degree 3. 

6. Let a G ^ be fixed, and define the linear transformation T G L(V) by 
T(v) = av. This is called the scalar mapping belonging to a. Show that T 
is the scalar mapping belonging to a if and only if the minimal polynomial 
for T is m(x) = x - a. 
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7.2 EIGENVALUES AND EIGENVECTORS 

We now make a very important definition. If T G L(V), then an element A, G 
J is called an eigenvalue (also called a characteristic value or characteristic 
root) of T if there exists a nonzero vector v G V such that T(v) = X,v. In this 
case, we call the vector v an eigenvector (or characteristic vector) belonging 
to the eigenvalue X,. Note that while an eigenvector is nonzero by definition, 
an eigenvalue may very well be zero. 

Throughout the remainder of this chapter we will frequently leave off the 
parentheses around vector operands. In other words, we sometimes write Tv 
rather than T(v). This simply serves to keep our notation as uncluttered as 
possible. 

An important criterion for the existence of an eigenvalue of T is the 
following. 

Theorem 7.6 A linear operator T G L(V) has eigenvalue A, G ^ if and only if 
Xl - T is singular. 

Proof Suppose X,l - T is singular. By definition, this means there exists a 
nonzero v G V such that (XI - T)v = 0. But this is just Tv = Xv. The converse 
should be quite obvious. I 

Note, in particular, that is an eigenvalue of T if and only if T is singular. 
In an exactly analogous manner, we say that an element X G ^ is an eigen- 
value of a matrix A G MnClT) if there exists a nonzero (column) vector v G 
such that Av = Xv, and we call v an eigenvector of A belonging to the eigen- 
value X. Given a basis {ei} for we can write this matrix eigenvalue equa- 
tion in terms of components as 



Now suppose T G L(V) and v G V. If {ei, . . . , en} is a basis for V, then v 
= 2iViei and hence 



where A = (ajj) is the matrix representation of T relative to the basis {ei}. 
Using this result, we see that if T(v) = Xv, then 



n 




i = l, ... , n . 



T(v) = T(2iViei) = 2iViT(ei) = SijejajiVi 



2i, j ej-ajiVi — 



XSjVjej 
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and hence equating components shows that SiajiVi = X,Vj. We thus see that (as 

expected) the isomorphism between L(V) and MnCF) (see Theorem 5.13) 
shows that X is an eigenvalue of the linear transformation T if and only if X is 
also an eigenvalue of the corresponding matrix representation A. Using the 
notation of Chapter 5, we can say that T(v) = A,v if and only if [T]e[v]e = 

?v[v]e . 



Example 7.1 Let us find all of the eigenvectors and associated eigenvalues 
of the matrix 

'1 2\ 
.3 2] • 



A = 



This means that we must find a vector v = (x, y) such that Av = Xv. In matrix 
notation, this equation takes the form 



'I 2 
,3 2 



or 




2-A, 



= 



This is equivalent to the system 



(l-?i)x + 2y = 
3x + (2-?i)y = 



(*) 



Since this homogeneous system of equations has a nontrivial solution if and 
only if the determinant of the coefficient matrix is nonzero (Corollary to 
Theorem 4.13), we must have 



1-A 2 
3 2-A 



= A2-3A-4 = (A-4)(A + 1) = 



We thus see that the eigenvalues are A, = 4 and A, = -1. (The roots of this 
polynomial are found either by inspection, or by applying the quadratic 
formula proved following Theorem 6.14.) 
Substituting X = 4 into (*) yields 



-3x + 2j = 
3x-2y = 
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or y = (3/2)x. This means that every eigenvector corresponding to the eigen- 
value A, = 4 has the form v = (x, 3x/2). In other words, every multiple of the 
vector V = (2, 3) is also an eigenvector with eigenvalue equal to 4. If we sub- 
stitute X = -1 in (*), then we similarly find y = -x, and hence every multiple 
of the vector v = (l,-l)isan eigenvector with eigenvalue equal to -1. / 

We will generalize this approach in the next section. However, let us first 
take a brief look at some of the relationships between the eigenvalues of an 
operator and the roots of its minimal polynomial. 

Theorem 7.7 Let X, be an eigenvalue of T £ L(V). Then p(A,) is an eigen- 
value of p(T) for any p £ ^[x]. 

Proof If X is an eigenvalue of T, then there exists a nonzero v £ V such that 
Tv = A,v. But then 

T\y) = T(Tv) = T(Xv) = XT(v) = X^v 

and by induction, it is clear that T'^(v) = X^y for any k = 1, 2, . . . . If we 
define p = ao + aiX + • • • + amx"^, then we have 

p(T) = aol + aiT + • • • + amT"^ 

and hence 

p(T)v = GqV + fljAv + • • • + a^A^v 
= (gq + fljA + • • • + a^A'")v 
= p(A)v . I 



Corollary Let X be an eigenvalue of T G L(V). Then X is a root of the mini- 
mal polynomial for T. 

Proof If m(x) is the minimal polynomial for T, then m(T) = by definition. 
From Theorem 7.7, we have m(X)v = m(T)v = where v ^0 is an eigenvector 
corresponding to X. But then m(X) = (see Theorem 2.1(b)) so that X is a root 
of m(x). I 

Since any eigenvalue of T is a root of the minimal polynomial for T, it is 
natural to ask about the number of eigenvalues that exist for a given T G L(V). 
Recall from the corollary to Theorem 6.4 that if c G is a root of f G ^[x], 

then (x - c)|f. If c is such that (x - c)™|f but no higher power of x - c divides 
f, then we say that c is a root of multiplicity m. (The context should make it 



7.2 EIGENVALUES AND EIGENVECTORS 



305 



clear whether we mean the multiplicity m or the minimal polynomial m(x).) In 
counting the number of roots that a polynomial has, we shall always count a 
root of multiplicity m as m roots. A root of multiplicity 1 is frequently called a 
simple root. 

If dim V = n then, since the minimal polynomial m for T E. L(V) is of 
degree at most n^ (Corollary to Theorem 7.3), there can be at most n^ roots of 
m (Theorem 6.14). In particular, we see that any T £ L(V) has a finite number 

of distinct eigenvalues. Moreover, if the field over which V is defined is alge- 
braically closed, then T will in fact have at least as many (not necessarily dis- 
tinct) eigenvalues as is the degree of its minimal polynomial. 

Theorem 7.8 If Vi, . . . , Vr are eigenvectors belonging to the distinct eigen- 
values Xi, . . . , Xr of T £ L(V), then the set {vi, . . . , Vr} is linearly indepen- 
dent. 

Proof If r = 1 there is nothing to prove, so we proceed by induction on r. In 
other words, we assume that the theorem is valid for sets of less than r eigen- 
vectors and show that in fact it is valid for sets of size r. Suppose that 

ajVi + • • • + a^Vj. =0 ( 1 ) 

for some set of scalars ai £ ^. We apply T to this relation to obtain 

a{r(v^) + --- + a^T(v^) = a^\v^ + --- + a^?i^Vi. =0 . (2) 

On the other hand, if we multiply (1) by A,r and subtract this from (2), we find 
(since Tvi = XiVi) 

ai(Xi - X,r)Vi + • • • + ar-l(Xr-l - A,r)Vr-l = . 

By our induction hypothesis, the set {v,, . . . , Vr-i} is linearly independent, 
and hence ai(Xi - Xr) = for each i=l,...,r-l. But the Xi are distinct so 
that Xi - Xr for i r, and therefore ai = for each i = 1, . . . , r - 1. Using 
this result in (1) shows that ar = (since Vr ^ by definition), and therefore 
ai = • • • = ar = 0. This shows that the entire collection {vi, . . . , Vr} is inde- 
pendent. I 

Corollary 1 Suppose T E L(V) and dim V = n. Then T can have at most n 
distinct eigenvalues in ^. 
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Proof Since dim V = n, there can be at most n independent vectors in V. 
Since n distinct eigenvalues result in n independent eigenvectors, this corol- 
lary follows directly from Theorem 7.8. I 

Corollary 2 Suppose T £ L(V) and dim V = n. If T has n distinct eigenval- 
ues, then there exists a basis for V which consists of eigenvectors of T. 

Proof If T has n distinct eigenvalues, then (by Theorem 7.8) T must have n 
linearly independent eigenvectors. But n is the number of elements in any 
basis for V, and hence these n linearly independent eigenvectors in fact form a 
basis for V. I 

It should be remarked that one eigenvalue can belong to more than one 
linearly independent eigenvector. In fact, if T E L(V) and X is an eigenvalue 
of T, then the set of all eigenvectors of T belonging to A, is a subspace of V 
called the eigenspace of "k. It is also easy to see that Vx. = Ker(Xl - T) (see 
Exercise 7.2.1). 

Exercises 

1. (a) If T G L(V) and X is an eigenvalue of T, show that the set W^of all 
eigenvectors of T belonging to X is a T-invariant subspace of V (i.e., a 
subspace with the property that T(v) G Vx for all v E Vx). 

(b) Show that Vx = Ker(Xl - T). 

2. An operator T G L(V) with the property that T° = for some n G Z is 
said to be nilpotent. Show that the only eigenvalue of a nilpotent 
operator is 0. 

3. If S, T G L(V), show that ST and TS have the same eigenvalues. [Hint: 
First use Theorems 5.16 and 7.6 to show that is an eigenvalue of ST if 
and only if is an eigenvalue of TS. Now assume K^Q, and let ST(v) = 
Xv. Show that Tv is an eigenvector of TS.] 

4. (a) Consider the rotation operator R(a) G L(IR ) defined in Example 1.2. 
Does R(a) have any eigenvectors? Explain. 

(b) Repeat part (a) but now consider rotations in R^. 

5. For each of the following matrices, find all eigenvalues and linearly inde- 
pendent eigenvectors: 
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'1 1\ 
.1 3J 



'A 2^ 
.3 3y 



(c) 



6. Consider the spaces D[R] and F[IR] defined in Exercise 2.1.6, and let d: 
D[IR] -> F[IR] be the usual derivative operator. 

(a) Show that the eigenfunctions (i.e., eigenvectors) of d are of the form 
exp(A,x) where A, is the corresponding eigenvalue. 

(b) Suppose . . . , X,r £ IR are distinct. Show that the set 

S = {exp(Xix), . . . , exp(Xrx)} 
is linearly independent. \Hint: Consider the linear span of S.] 

7. Suppose T £ L(V) is invertible. Show that X, is an eigenvalue of T if and 
only if X, ;t and X,"' is an eigenvalue of T"'. 

8. Suppose T G L(V) and dim V = n. If T has n linearly independent eigen- 
vectors, what can you say about the matrix representation of T? 

9. Let V be a two-dimensional space over R, and let {ei, e2} be a basis for 
V. Find the eigenvalues and eigenvectors of the operator T G L(V) 
defined by: 

(a) Tei = ei + e2 Te2 = ei - e2. 

(b) Tei = 5ei + 6e2 Te2 = -7e2. 

(c) Tei = ei + 2e2 Te2 = 3ei + 6e2. 

10. Suppose A e Mn(C) and define R; = 2^= ilayl and Pi = Ri - lajil. 

(a) Show that if Ax = for some nonzero x = (xi, . . . , Xn), then for any r 
such that Xr ;t we have 



xJ = 



(b) Show that part (a) implies that for some r we have larri ^ Pr- 

(c) Prove that if lanl > P, for all i = 1, . . . , n, then all eigenvalues of A are 
nonzero (or, equivalently, that det A 9^ 0). 

11. (a) Suppose A G Mn(C) and let X be an eigenvalue of A. Using the pre- 
vious exercise prove Gershgorin's Theorem: |X - arri ^ Pr for some r 
with 1 < r < n. 

(b) Use this result to show that every eigenvalue X of the matrix 
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(4 


1 


1 





1^ 


1 


3 


1 








1 


2 


3 


1 





1 


1 


-1 


4 





.1 


1 


1 


1 


5. 



satisfies 1 < < 9. 



7.3 CHARACTERISTIC POLYNOMIALS 

So far our discussion has dealt only theoretically with the existence of eigen- 
values of an operator T G L(V). From a practical standpoint (as we saw in 
Example 7.1), it is much more convenient to deal with the matrix representa- 
tion of an operator. Recall that the definition of an eigenvalue X, G ^ and 
eigenvector v = Svjei of a matrix A = (aij) G MnCT) is given in terms of 
components by 2jaijVj = Xvj for each i = 1, . . . , n. This may be written in the 
form 

n n 
7=1 7=1 

or, alternatively, as 

i(H-«^>7=o • 

7=1 

In matrix notation, this is 

(XI - A)v = . 

By the corollary to Theorem 4.13, this set of homogeneous equations has a 
nontrivial solution if and only if det(A,I - A) = 0. 

Another way to see this is to note that by Theorem 7.6, A, is an eigenvalue 
of the operator T G L(V) if and only if A,l - T is singular. But according to 
Theorem 5.16, this means that det(Xl - T) = (recall that the determinant of a 
linear transformation T is defined to be the determinant of any matrix 
representation of T). In other words. A, is an eigenvalue of T if and only if 
det(Al - T) = 0. This proves the following important result. 

Theorem 7.9 Suppose T e L(V) and X, e 5. Then X is an eigenvalue of T if 
andonly if det(Xl -T) = 0. 
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Let [T] be a matrix representation of T. The matrix xl - [T] is called the 
characteristic matrix of [T], and the expression det(xl - T) = is called the 
characteristic (or secular) equation of T. The determinant det(xl - T) is 
frequently denoted by At(x). Writing out the determinant in a particular basis, 
we see that det(xl - T) is of the form 



Ay (x) = 



x-a 



11 



-a 



12 



-a 



•21 



X-a 



22 



-a 



nl 



-a 



nl 



-a 



In 



-a 



■2n 



x-a„ 



where A = (ajj) is the matrix representation of T in the chosen basis. Since the 
expansion of a determinant contains exactly one element from each row and 
each column, we see that (see Exercise 7.3.1) 



det(xl -T) = (x- aii)(x - Oja)" ' "(-^ ~ ^nn) 

+ terms containing n-l factors of the form x - a„ 
+• • • + terms with no factors containing x 
= x" - (TrA)x"~^ + terms of lower degree in x + (-1)" det A 



This polynomial is called the characteristic polynomial of T. 

From the discussion following Theorem 5.18, we see that if A' = P"'AP is 
similar to A, then 

det(xI-A') = det(xl - AP) = det[P-'(xI - A)P] = det(xI-A) 

(since det P"' = (det P)"'). We thus see that similar matrices have the same 
characteristic polynomial (the converse of this statement is not true), and 
hence also the same eigenvalues. Therefore the eigenvalues (not eigenvectors) 
of an operator T G L(V) do not depend on the basis chosen for V. Note also 
that according to Exercise 4.2.14, we may as well write Tr T and det T (rather 
than Tr A and det A) since these are independent of the particular basis cho- 
sen. Using this terminology, we may rephrase Theorem 7.9 as follows. 

Theorem 7.9' A scalar X G ^ is an eigenvalue of T G L(V) if and only if X 
is a root of the characteristic polynomial At(x). 

Since the characteristic polynomial is of degree n in x, the corollary to 
Theorem 6.14 tells us that if we are in an algebraically closed field (such as 
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C), then there must exist n roots. In this case, the characteristic polynomial 
may be factored into the form 

det(xl - T) = (x - Xi)(x - X2) • • • (x - Xn) 

where the eigenvalues are not necessarily distinct. Expanding this expres- 
sion we have 



det(xl - T) = x° - (2Xi)x" -!+•••+ (-1)" XiX 



Comparing this with the above general expression for the characteristic poly- 
nomial, we see that 



i=l 



and 



detr=f]A, 



i=l 



It should be remembered that this result only applies to an algebraically closed 
field (or to any other field ^ as long as all n roots of the characteristic polyno- 
mial lie in ^f). 

Example 7.2 Let us find the eigenvalues and eigenvectors of the matrix 



A = 



'I 4^ 
.2 3, 



The characteristic polynomial of A is given by 



x-l -4 
-2 x-3 



= -4x-5 = (x- 5)(x + 1) 



and hence the eigenvalues of A are X = 5, -1. To find the eigenvectors corre- 
sponding to each eigenvalue, we must solve Av = kv or (kl - A)v = 0. Written 
out for A, = 5 this is 

/ 4 -4Vx\ /0\ 



,-2 



We must therefore solve the set of homogeneous linear equations 
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4x-4y = 
-2x + 2y = 



which is clearly equivalent to the single equation x - y = 0, or x = y. This 
means that every eigenvector corresponding to the eigenvalue X = 5 is a mul- 
tiple of the vector (1, 1), and thus the corresponding eigenspace is one- 
dimensional. 

For A, = -1 we have 





-4\ 


(1= 






-4. 







and the equation to be solved is (since both are the same) -2x - 4y = 0. The 
solution is thus -x = 2y so that the eigenvector is a multiple of (2, -1). 
We now note that 

TrA = 1 + 3 = 4 = SXi 

and 

det A = 3 - 8 = -5 = . 



It is also easy to see that these relationships hold for the matrix given in 
Example 7.1. / 



It is worth remarking that the existence of eigenvalues of a given operator 
(or matrix) depends on the particular field we are working with. For example, 
the matrix 

.1 



A = 



has characteristic polynomial x + 1 which has no real roots, but does have the 
complex roots ±i. In other words, A has no eigenvalues in R, but does have 
the eigenvalues +/ in C (see Exercise 7.3.6). 

Returning to the general case of an arbitrary field, it is clear that letting 
X, = in At(X,) = det(X,l - T) = shows that the constant term in the charac- 
teristic polynomial of A is given by At(0) = (-l)"det A. In view of Theorems 
4.6, 7.5 and the corollary to Theorem 7.7, we wonder if there are any relation- 
ships between the characteristic and minimal polynomials of T. There are 
indeed, and the first step in showing some of these relationships is to prove 
that every A £ MnC?0 satisfies some nontrivial polynomial. This is the 
essential content of our next result, the famous Cayley-Hamilton theorem. 

Before we prove this theorem, we should point out that we will be dealing 
with matrices that have polynomial entries, rather than entries that are simply 
elements of the field !f. However, if we regard the polynomials as elements of 
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the field of quotients (see Section 6.5), then all of our previous results (for 
example, those dealing with determinants) remain valid. We shall elaborate on 
this problem in detail in the next chapter. Furthermore, the proof we are about 
to give is the standard one at this level. We shall find several other methods of 
proof throughout this text, including a remarkably simple one in the next 
chapter (see the discussion of matrices over the ring of polynomials). 

Theorem 7.10 (Cayley-Hamilton Theorem) Every matrix A £ MnCD sat- 
isfies its characteristic polynomial. 

Proof First recall from Theorem 4. 1 1 that any matrix A G MnC?0 obeys the 
relation 

A(adjA) = (detA)In 

where adj A is the matrix whose elements are the determinants of the minor 
matrices of A. In particular, the characteristic matrix xl - A obeys 



where we let 



(xl - A)B(x) = det(xl - A)l 
B(x) = adj(xI-A) . 



Thus the entries of the n x n matrix B(x) are polynomials in x of degree < n 
1. For example, if 

V+2 X 3\ 
B(x)= -x + 1 1 
4 jc2, 



then 





(I 





0^ 









1 


0^ 




(2 





3\ 


B(x) = 











x^ + 




-1 








x + 


1 


1 

















I 








0. 




.0 


4 


0. 



Hence in general, we may write B(x) in the form 



B(x) = Bo + BiX + • • • + Bn-ix 



n-l 



where each Bj G M^(Jf). 
Now write 



Aa(x) = det(xl - A) = ao + aiX + • • • + an-ix° ^ + x° 
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Then Theorem 4.11 becomes 

(xl - A)(Bo + Bix + • • • + Bn-ix°-i) = (ao + aix + • • • + an-ix^-^ + x°)I . 
Equating powers of x in this equation yields 

-AB^ = a^I 
Bo-ABi=aJ 
5j - ABj = fl!2^ 

B„_2-AB„_,=a„_J 
Bn-i = I 

We now multiply the first of these equations by = I, the second by = A, 
the third by A^, . . . , the nth by A°"\ and the last by A° to obtain 

-ABq = UqI 
ABq - A^B^ = a^A 
A^B^-A^B^=a^A^ 

A"-^B„_^-A"B^_,=a,_,A"-^ 
A"B^_,=A" 

Adding this last group of matrix equations, we see that the left side cancels 
out and we are left with 

= aol + aiA + SLjA^ + ■■■ + an-iA°-i + A° . 

This shows that A satisfies its characteristic polynomial. I 

In view of this theorem, we see that there exists a nonempty set of nonzero 
polynomials p(x) G ^[x] such that p(A) = for any A G MnCT)- (Alterna- 
tively, Theorem 7.3 and its corollary apply equally well to the algebra of 

matrices, although the degree is bounded by n^ rather than by n.) As we did 
for linear transformations, we may define the minimal polynomial for A as 

that polynomial p(x) of least degree for which p(A) = 0. We also noted 

following Theorem 7.4 that any T G L(V) and its corresponding represen- 
tation A G Mn(^ satisfy the same minimal polynomial. Theorem 7.4 thus 
applies equally well to matrices, and hence there exists a unique monic 
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minimal polynomial m(x) for A such that m(A) = 0. In addition, m(x) divides 
every other polynomial which has A as a zero. In particular, since A satisfies 
Aa(x), we must have m(x)|AA(x). 

Theorem 7.11 Suppose A G Mn(^ and m(x) is the minimal polynomial for 
A. Then m(x)|AA(x) and AA(x)|[m(x)]". 

Proof That m(x)|AA(x) was proved in the previous paragraph. Let m(x) = 
x'^ + mix'^"^ + • • • + mk-ix + mk. We define the matrices Bj El MnClF) by 

B,=I 

B^=A + mj 

82= + m^A + mj/ 

= A*-i + + • • • + m^_i/ 

where 1 = In- Working our way successively down this set of equations, we 
may rewrite them in the form 

5i - ABq = mj/ 
B2 - ABj = m2l 

B^_^-AB^_2=m^_J 

From the previous expression for B^-i, we multiply by -A and then add and 
subtract m^I to obtain (using m(A) = 0) 

-AB^_j = mj - (A^ + m^A^-^ + • • • + m^_iA + m^I) 
- nii^I - m(A) 
= m^/ . 

Now define B(x) = x'^'^Bq + x'^'^Bi + • • • + xBk_2 + Bk_i (which may be 
viewed either as a polynomial with matrix coefficients or as a matrix with 
polynomial entries). Then, using our previous results, we find that 
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(xl - A)B(x) = xB(x) - AB(x) 

= (x'^Bq + x'^-^Bi + ■■■ + x^B^_2 + xB^_i ) 

- (jc*-i ABq + x''-^AB^ + ■■■ + xAB^_2 + AB^_i ) 

= x*Bo + ^'^"^ (A - ^0 ) + (^2 - M ) 
+ - + x(B,_,-AB,_^)-AB,_, 

= x^I + roiX^'^I + mjX*^"^/ + • • • + mj^_iXl + m^^/ 

= m(x)/ . (*) 

Since the determinant of a diagonal matrix is the product of its (diagonal) 
elements (see the corollary to Theorem 4.5), we see that 

det[m(x)I] = [m(x)]" . 

Therefore, taking the determinant of both sides of (*) and using Theorem 4.8 
we find that 

[det(xl - A)] [det B(x)] = det[m(x)I] = [m(x)]" . 

But det B(x) is just some polynomial in x, so this equation shows that [m(x)]" 
is some multiple of Aa(x) = det(xl - A). In other words, AA(x)|[m(x)]". I 

Theorem 7.12 The characteristic polynomial Aa(x) and minimal polynomial 
m(x) of a matrix A £ MnClT) have the same prime factors. 

Proof Let m(x) have the prime factor f(x) so that f(x)|m(x). Since we showed 
in the above discussion that m(x)|AA(x), it follows that f(x)|AA(x) and hence 
f(x) is a factor of Aa(x) also. Now suppose that f(x)|AA(x). Theorem 7.11 

shows that AA(x)|[m(x)]", and therefore f(x)|[m(x)]". However, since f(x) is 
prime. Corollary 2' of Theorem 6.5 tells us that f(x)|m(x). I 

It is important to realize that this theorem does not say that Aa(x) = m(x), 
but only that Aa(x) and m(x) have the same prime factors. However, each fac- 
tor can be of a different multiplicity in m(x) from what it is in Aa(x). In 
particular, since m(x)|AA(x), the multiplicity of any factor in Aa(x) must be 
greater than or equal to the multiplicity of the same factor in m(x). Since a 
linear factor (i.e., a factor of the form x - A,) is prime, it then follows that 
Aa(x) and m(x) have the same roots (although of different multiplicities). 

Theorem 7.13 Suppose A £ Mn(^ and KE:^. Then A, is an eigenvalue of A 
if and only if X is a root of the minimal polynomial for A. 
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Proof By Theorem 7.9, X, is an eigenvalue of A if and only if Aa(A,) = 0. But 
from the above remarks, A, is a root of Aa(x) if and only if X is a root of the 
minimal polynomial for A. I 

An alternative proof of Theorem 7.13 is to note that since m(x)|AA(x), we 
may write Aa(x) = m(x)p(x) for some polynomial p(x). If X, is a root of m(x), 
then Aa(X,) = m(X,)p(X,) = so that X, is also a root of Aa(x). In other words, X 
is an eigenvalue of A. The converse is just the corollary to Theorem 7.7. If we 
use this proof, then Theorem 7.12 is essentially just a corollary of Theorem 
7.13. 

Using Theorem 7.13, we can give another proof of Theorem 7.5 which 
also applies to the characteristic polynomial of any T G L(V). In particular, 
from Theorem 5. 10 we see that T is invertible if and only if T is nonsingular if 
and only if is not an eigenvalue of T (because this would mean that Tv = 
Ov = for some v ^ 0). But from Theorem 7.13, this is true if and only if is 
not a root of the minimal polynomial for T. Writing the minimal polynomial 
as m(x) = ao + aiX + • • • + ak-ix''"^ + x'', we then see that ao = m(0) 9^ as 
claimed. 

Example 7.3 Consider the matrix A given by 

'2 10 0' 

2 

2 0' 

^0 5^ 

The characteristic polynomial is given by 

Aa(x) = det(xI-A) = (x-2)3(x-5) 

and hence Theorem 7.12 tells us that both x - 2 and x - 5 must be factors of 
m(x). Furthermore, it follows from Theorems 7.4 and 7.10 that m(x)|AA(x), 
and thus the minimal polynomial must be either mi(x), m2(x) or m3(x) where 

mj(x) = (x-2)j(x-5) . 

From the Cay ley-Hamilton theorem we know that m3(A) = Aa(A) = 0, and it 
is easy to show that m2(A) = also while mi(A) ^ 0. Therefore the minimal 
polynomial for A must be m2(x). / 
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We now turn our attention to one of the most important aspects of the 
existence of eigenvalues. Suppose that T E. L(V) with dim V = n. If V has a 
basis {V|, . . . , Vn} that consists entirely of eigenvectors of T, then the matrix 
representation of T in this basis is defined by 

n n 

r(v,) = ^vjaj, = A,v, = ^6j,XjVj 

7=1 7=1 

and therefore ay, = b^Xj. In other words, T is represented by a diagonal matrix 
in a basis of eigenvectors. Conversely, if T is represented by a diagonal matrix 
aji = 6jiA,j relative to some basis {Vj}, then reversing the argument shows that 
each Vi is an eigenvector of T. This proves the following theorem. 

Theorem 7.14 A linear operator T G L(V) can be represented by a diagonal 
matrix if and only if V has a basis consisting of eigenvectors of T. If this is the 
case, then the diagonal elements of the matrix representation are precisely the 
eigenvalues of T. (Note however, that the eigenvalues need not necessarily be 
distinct.) 

If T G L(V) is represented in some basis by a matrix A, and in the 
basis of eigenvectors Vj by a diagonal matrix D, then Theorem 5.18 tells us 
that A and D must be similar matrices. This proves the following version of 
Theorem 7.14, which we state as a corollary. 

Corollary 1 A matrix A G Mn(^ is similar to a diagonal matrix D if and 
only if A has n linearly independent eigenvectors. 

Corollary 2 A linear operator T G L(V) can be represented by a diagonal 
matrix if T has n = dim V distinct eigenvalues. 

Proof This follows from Corollary 2 of Theorem 7.8. I 

Note that the existence of n = dim V distinct eigenvalues of T G L(V) is a 
sufficient but not necessary condition for T to have a diagonal representation. 
For example, the identity operator has the usual diagonal representation, but 
its only eigenvalues are A, = 1. In general, if any eigenvalue has multiplicity 
greater than 1, then there will be fewer distinct eigenvalues than the 
dimension of V. However, in this case we may be able to choose an 
appropriate linear combination of eigenvectors in each eigenspace so that the 
matrix of T will still be diagonal. We shall have more to say about this in 
Section 7.7. 

We say that a matrix A is diagonalizable if it is similar to a diagonal 
matrix D. If P is a nonsingular matrix such that D = P"' AP, then we say that P 
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diagonalizes A. It should be noted that if X, is an eigenvalue of a matrix A 
with eigenvector v (i.e., Av = Xv), then for any nonsingular matrix P we have 

(p-'AP)(P-'v) = p-'Av = p-'Xv = X(P-V) . 

In other words, P"W is an eigenvector of P"^AP. Similarly, we say that T £ 
L(V) is diagonalizable if there exists a basis for V that consists entirely of 
eigenvectors of T. 

While all of this sounds well and good, the reader might wonder exactly 
how the transition matrix P is to be constructed. Actually, the method has 
already been given in Section 5.4. If T £ L(V) and A is the matrix 
representation of T in a basis {ei}, then P is defined to be the transformation 
that takes the basis {ei} into the basis {Vi} of eigenvectors. In other words, 
Vi = Pei = SjCjPji. This means that the ith column of (py) is just the ith 
eigenvector of A. The fact that P must be nonsingular coincides with the 
requirement that T (or A) have n linearly independent eigenvectors Vj. 

Example 7.4 Referring to Example 7.2, we found the eigenvectors Vi = (1, 1) 
and V2 = (2, -1) belonging to the matrix 



A = 




Then P and P ' are given by 



and 



and therefore 




detP [1/3 



D = P-^AP = 



a/3 2/3Y1 4Yl 2^ 
^1/3 -l/3j(2 3j(l -1 



5 
-1 



We see that D is a diagonal matrix, and that the diagonal elements are just the 
eigenvalues of A. Note also that 



D(P-\) = 



(5 OVl/S 2/3Yl^ 
^0 -ljl^l/3 -l/3jW 
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-:) 




-5 















with a similar result holding for P"'v2. / 
Example 7.5 Let us show that the matrix 



A = 



'I 2) 
.0 ij 



is not diagonalizable. The characteristic equation is (x - 1)^ = 0, and hence 
there are two identical roots X, = 1. If there existed an eigenvector v = (x, y), it 
would have to satisfy the equation (XI - A)v = or 



(0 -2^/ 




.1 



Since this yields -2y = 0, the eigenvectors must be of the form (x, 0), and 
hence it is impossible to find two linearly independent such eigenvectors. 

Note that the minimal polynomial for A is either x - 1 or (x - 1)^. But 
since A - II 0, m(x) must be (x - 1)^. We will see later (Theorem 7.24) that 
a matrix is diagonalizable if and only if its minimal polynomial is a product of 
distinct linear factors. / 



Exercises 

1. Suppose T £ L(V) has matrix representation A = (ay), and dim V = n. 
Prove that 

det(xl - T) 

= x" - (Tr A)x""^ + terms of lower degree in x + (-l)"det A. 
[Hint: Use the definition of determinant.] 

2. Suppose T G L(V) is diagonalizable. Show that the minimal polynomial 
m(x) for T must consist of distinct linear factors. [Hint: Let T have dis- 
tinct eigenvalues Xi, . . . , Xr and consider the polynomial 
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f(x) = (X-Xi)---(X-Xr) . 

Show that m(x) = f(x).] 

3. Prove by direct substitution that Aa(A) = if A G M^CF) is diagonal. 

4. Find, in the form ao + aix + a2X^ + asx^, the characteristic polynomial of 





(1 


2 -\\ 


A = 





3 1 






-2^ 



Show by direct substitution that A satisfies its characteristic polynomial. 

5. If T G L(V) and At(x) is a product of distinct linear factors, prove that T 
is diagonalizable. 

6. Consider the following matrices: 




(a) Find all eigenvalues and linearly independent eigenvectors over R. 

(b) Find all eigenvalues and linearly independent eigenvectors over C. 

7. For each of the following matrices, find all eigenvalues, a basis for each 
eigenspace, and determine whether or not the matrix is diagonalizable: 





^1 -3 


3^ 




(-?> 


1 -i\ 




3 -5 


3 


{b) 


-7 


5 -1 




^6 -6 


4 




r6 


6 -2^ 



8. Consider the operator T G L(IR-^) defined by 

T(x, y, z) = (2x + y, y-z, 2y + 4z) . 
Find all eigenvalues and a basis for each eigenspace. 

9. Let A = (ajj) be a triangular matrix, and assume that all of the diagonal 
entries of A are distinct. Is A diagonalizable? Explain. 
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10. Suppose A £ M3(IR). Show that A can not be a zero of the polynomial f = 
x2 + 1. 

11. If A E MnClTO, show that A and A^ have the same eigenvalues. 

12. Suppose A is a block triangular matrix with square matrices An on the 
diagonal. Show that the characteristic polynomial of A is the product of 
the characteristic polynomials of the Aii. 

13. Find the minimal polynomial of 



A = 



(2 


1 





0^ 





2 














1 


1 


.0 





-2 


4y 



14. For each of the following matrices A, find a nonsingular matrix P (if it 
exists) such that P"' AP is diagonal: 







1 


1^ 




( 1 


2 


2^ 




(\ 


1 


0\ 


(a) A = 


2 


4 


2 


ib) 


1 


2 


-1 


(c) 





1 







.1 


1 


3. 




ri 


1 


4 




.0 








15. Consider the following real matrix: 

(a b\ 

A = 

dj 

Find necessary and sufficient conditions on a, b, c and d so that A is diag- 
onalizable. 

16. Let A be an idempotent matrix (i.e., A^ = A) of rank r. Show that A is 
similar to the matrix 

B= ' . 

lo oj 

17. Let V be the space of all real polynomials f G IR[x] of degree at most 2, 
and define T e L(V) by Tf = f + f ' + xf where f denotes the usual 
derivative with respect to x. 
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(a) Write down the most obvious basis {ei, e2, es} for V you can think 

of, and then write down [T] g . 

(b) Find all eigenvalues of T, and then find a nonsingular matrix P such 
that p-'[T]e Pis diagonal. 

18. Prove that any real symmetric 2x2 matrix is diagonalizable. 

19. (a) Let C e M2(C) be such that = 0. Prove that either C = 0, or else C 
is similar over C to the matrix 



^0 0^ 

ll 0; 



(b) Prove that A G M2(C) is similar over C to one of the following two 
types of matrices: 



'a 

,0 ^ 



or 



/a 0^ 

1 a. 



20. Find a matrix A G M3(R) that has the eigenvalues 3, 2 and 2 with corre- 
sponding eigenvectors (2, 1, 1), (1, 0, 1) and (0, 0, 4). 



21. Is it possible for the matrix 



A = 



/ 3 1 0\ 

-10 

2 1 

^ 3 ?; 



to have the eigenvalues -1, 2, 3 and 5? 



7.4 ANNIHILATORS 

The purpose of this section is to repeat our description of minimal polynomi- 
als using the formalism of ideals developed in the previous chapter. Our 
reasons for this are twofold. First, we will gain additional insight into the 
meaning and action of minimal polynomials. And second, these results will be 
of use in the next chapter when we discuss cyclic subspaces. 

If V is a vector space of dimension n over ^, then for any v G V and any 

T G L(V), the n + 1 vectors v, T(v), T^(v), . . . , T"(v) must be linearly 
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dependent. This means there exist scalars ao, . . . ,SLn^^F not all equal to zero, 
such that 

n 

If we define the polynomial f(x) = ao + aiX + • • • + anX° £ ^[x], we see that 

our relation may be written as f(T)(v) = 0. In other words, given any (fixed) 
V G V and T E L(V), then there exists a polynomial f of degree < n such that 
f(T)(v) = 0. 

Now, for any fixed v E V and T E L(V), we define the set Nt(v) by 

Nt(v) = {f(x)ej[x]:f(T)(v) = 0} . 

This set is called the T-annihilator of v. If fi, E Nt(v), then we have 

[fi(T)±f,(T)](v) = fi(T)(v)±f,(T)(v) = 

and for any g(x) E ^[x], we also have 

[g(T)fi(T)](v) = g(T)[fi(T)(v)] = . 

This shows that Nt(v) is actually an ideal of iF[x]. Moreover, from Theorem 
6.8 and its corollary, we see that Nt(v) is a principal ideal, and hence has a 
unique monic generator, which we denote by mv(x). By definition, this means 
that Nt(v) = mv(x)^[x], and since we showed above that Nt(v) contains at 
least one polynomial of degree less than or equal to n, it follows from 
Theorem 6.2(b) that deg mv(x) < n. We call mv(x) the minimal polynomial of 
the vector v corresponding to the given transformation T. (Many authors also 
refer to mv(x) as the T-annihilator of v, or the order of v.) It is thus the 
unique monic polynomial of least degree such that m(T)(v) = 0. 

Theorem 7.15 Suppose T E L(V), and let v E V have minimal polynomial 
mv(x). Assume mv(x) is reducible so that mv(x) = mi(x)m2(x) where mi(x) 
and m2(x) are both monic polynomials of degree > 1. Then the vector w = 
m,(T)(v) G V has minimal polynomial m2(x). In other words, every factor of 
the minimal polynomial of a vector is also the minimal polynomial of some 
other vector. 

Proof First note that 



m,(T)(w) = m2(T)[mi(T)(v)] = mv(T)(v) = 
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and thus m2(x) £ Nt(w). Then for any nonzero polynomial f(x) £ ^[x] with 
f(T)(w) = 0, we have f(T)[mi(T)(v)] = 0, and hence f(x)mi(x) e Nt(v) = 
mv(x) ^[x]. Using this result along with Theorem 6.2(b), we see that 

deg mi + deg f = deg mif > deg my = deg mi + deg m2 

and therefore deg mj < deg f. This shows that m2(x) is the monic polynomial 
of least degree such that m2(T)(w) = 0. ■ 

Theorem 7.16 Suppose T G L(V), and let mu(x) and mv(x) be the minimal 
polynomials of u G V and v G V respectively. Then the least common 
multiple m(x) of mu(x) and mv(x) is the minimal polynomial of some vector 
inV. 

Proof We first assume that mu(x) and mv(x) are relatively prime so that their 
greatest common divisor is 1. By Theorem 6.9 we then have mu(x)mv(x) = 
m(x). From Theorem 6.5 we may write mu(x)k(x) + mv(x)h(x) = 1 for some 
polynomials h, k G jF[x], and therefore 

mu(T)k(T) + mv(T)h(T) = 1 . 

Now define the vector w G V by 

w = h(T)(u) + k(T)(v) . 

Then, using mu(T)(u) = = mv(T)(v) we have 

m^{T){w) = m^{T)h{T){u) + m^{T)k{T){v) 
= m^{T)k{T){v) 
= {\-m,{T)h{T)\{v) 
= V 

and similarly, we find that mv(T)(w) = u. This means mu(T)mv(T)(w) = so 
that mij(x)mv(x) G Nt(w). 

Now observe that Nt(w) = mw(x)^[x] where mw(x) is the minimal poly- 
nomial of w. Then 

mw(T)(u) = mw(T)mv(T)(w) = 

and 

mw(T)(v) = mw(T)mu(T)(w) = 



7.4 ANNIHILATORS 



325 



so that mw(x) e Nt(u) n Nt(v). Since Nt(u) = mu(x)J[x] and Nt(v) = 
mv(x)^[x], we see from Example 6.9 (along with the fact that mu(x) and 
mv(x) are relatively prime) that 

mw(x) E m(x)jF[x] = mu(x)mv(x)5[x] 

and hence mu(x)mv(x)|mw(x). On the other hand, since 

mu(x)mv(x) e Nt(w) = mw(x)J[x] 

we have mw(x)|mu(x)mv(x). Since mw(x), mu(x) and mv(x) are monic, it fol- 
lows that mw(x) = mu(x)mv(x) = m(x). This shows that in the case where 
mu(x) and mv(x) are relatively prime, then m(x) = mu(x)mv(x) is the minimal 
polynomial of w. 

Now let d(x) be the greatest common divisor of mu(x) and mv(x), and con- 
sider the general case where (see Theorem 6.9) mu(x)mv(x) = m(x)d(x). Using 
the notation of Theorem 6.10, we write mu = a|3 and my = yb. Since and 
my are minimal polynomials by hypothesis. Theorem 7.15 tells us that a and 6 
are each also the minimal polynomial of some vector. However, by their con- 
struction, a and 6 are relatively prime since they have no factors in common. 
This means that we may apply the first part of this proof to conclude that a8 is 
the minimal polynomial of some vector. To finish the proof, we simply note 
that (according to Theorem 6. 10) a8 is just m(x). I 

A straightforward induction argument gives the following result. 

Corollary For each i = 1, . . . , k let my^(x) be the minimal polynomial of a 
vector Vi E V. Then there exists a vector w E V whose minimal polynomial 
m(x) is the least common multiple of the mvi(x). 

Now suppose that T G L(V) and V has a basis {v,, . . . , Vn}. If mvi(x) is 
the minimal polynomial of Vj, then by the corollary to Theorem 7.16, the least 
common multiple m(x) of the mvi(x) is the minimal polynomial of some vec- 
tor w E V, and therefore deg m(x) < dim V = n. But m(x) is the least common 
multiple, so that for each i = 1, . . . , n we have m(x) = fi(x)mvi(x) for some 
fi(x) E J[x]. This means that 

m(T)(vO = [fi(T)mvi(T)](vO = fi(T)[mvi(T)(vO] = 

for each Vj, and hence m(T) = 0. In other words, every T E L(V) satisfies 
some monic polynomial m(x) with deg m(x) < dim V = n. 
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We now define the (nonempty) set 



Nt = {f(x)EJ[x]:f(T) = 0} . 



As was the case with the T-annihilator, it is easy to prove that Nj is an ideal 
of !f[x\. Since Nj consists of those polynomials in T that annihilate every 
vector in V, it must be the same as the intersection of all T-annihilators Nt(v) 
in V, i.e.. 



By Theorem 6.8 the ideal Nj is principal, and we define the minimal polyno- 
mial for T G L(V) to be the unique monic generator of Nj. We claim that the 
minimal polynomial for T is precisely the polynomial m(x) defined in the pre- 
vious paragraph. 

To see this, note first that deg m(x) < dim V = n, and since m(x) is the 
minimal polynomial of some w G V, it follows directly from the definition of 
the minimal polynomial of w as the unique monic generator of Nt(w) that 
Nt(w) = m(x)J[x]. Next, the fact that m(T) = means that m(T)(v) = for 
every v G V, and therefore m(x) G nvevNxCv) = Nj. Since any polynomial 
in Nt(w) is a multiple of m(x) and hence annihilates every v G V, we see that 
Nt(w) C Nt. Conversely, any element of Nt is automatically an element of 
Nt(w), and thus Nt = Nt(w) = m(x)^[x]. This shows that m(x) is the 
minimal polynomial for T, and since m(x) generates Nt, it is the polynomial 
of least degree such that m(T) = 0. 



Example 7.6 Let V = have basis {ei, e2, es, e4} and define the operator 
T G L(V) by 



Note that since T(e2) = T(e4), the matrix representation of T has zero determi- 
nant, and hence T is singular (either by Theorem 5.9 or Theorem 5.16). 
Alternatively, we have T(e2 - e4) = so that T must be singular since e2 - e4 
^ 0. In any case, we now have 



Nt = nvevNT(v) . 



Tie,) 




T{e2 ) = 

7(^4) = 3^2 - £4 



T2(e0 = T(ei + e3) = T(e0+T(e3) = 4c, 



so that 



(t2 - 4)(ei) = (T - 2)(T + 2) = . 



Similarly 



T\e2) = T(3e2 - e4) = 3T(e2) - T(e4) = 6e2 - 2e4 = 2T(e2) 
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so that 

(T2-2T)(e,) = T(T-2)(e2) = . 
Thus the minimal polynomial of Ci is given by 

mi(x) = x2 - 4 = (x - 2)(x + 2) 
and the minimal polynomial of e2 is given by 

m2(x) = x(x - 2) . 

That these are indeed minimal polynomials is clear if we note that in neither 
case will a linear expression in T annihilate either ei or e.2 (just look at the 
definition of T). 

It should be obvious that the least common multiple of mi and mj is 
x(x - 2)(x + 2) = x(x2 - 4) 

and hence (by Theorem 7.16) this is the minimal polynomial of some vector 
w £ IR"^ which we now try to find. We know that mi(x) = x^ - 4 is the minimal 
polynomial of ei, but is x the minimal polynomial of some vector u? Since 
m2(x) = x(x - 2) is the minimal polynomial of 62, we see from Theorem 7.15 
that the vector u = (T - 2)(e2) = e2 - 64 has minimal polynomial x. Now, x and 

X - 4 are relatively prime so, as was done in the first part of the proof of 
Theorem 7.16, we define the polynomials hi(x) = x/4 and ki(x) = -1/4 by the 

requirement that xhi + (x - 4)ki = 1. Hence 

w = (T/4)(e,) + (-l/4)(u) = (l/4)(ei-e2 + e3 + e4) 

is the vector with minimal polynomial x(x^ - 4). 

We leave it to the reader to show that T\e3) = 4e3 and T\e4) = 2T(e4), 
and thus es has minimal polynomial m3(x) = x^ - 4 and e4 has minimal poly- 
nomial m4(x) = x(x - 2). It is now easy to see that m(x) = x(x ^ - 4) is the 
minimal polynomial for T since m(x) has the property that m(T)(ei) = for 
each i = 1, . . . , 4 and it is the least common multiple of the mi(x). We also 
note that the constant term in m(x) is zero, and hence T is singular by 
Theorem 7.5. / 

Example 7.7 Relative to the standard basis for R^, the matrix 
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(2 -1 0\ 



A = 



-2 

1 1 



represents the operator T £ L(IR ) defined by Tei = 2ei + e2, Te2 = -ei + es 



and Tes = -2e2 + es. It is easy to see that T^ei = 3ei + 2e2 + es and T^ei = 

4ei + e2 + 3e3, and hence the set {ei, Tei, T ^ei} is linearly independent, while 

the four vectors {ei, Tei, T^ei, T^e 
there exists a, b, c, d £ IR such that 



r3„. _ 



the four vectors {ei, Tei, T^ei, T^ei} must be linearly dependent. This means 



aei + bTe2 + cT^ ei + dT^ ei = 



This is equivalent to the system 



a + 2/7 + 3c + 4 J = 
b + 2c+ d = 
c+ 3d = 



Since there is one free variable, choosing d = 1 we find a = -5, b = 5 and c = 

-3. Therefore the minimal polynomial of ei is mi(x) = - 3x^ + 5x - 5. 
Since the 1cm of the minimal polynomials of ci, e2 and 63 must be of degree < 
3, it follows that mi(x) is in fact the minimal polynomial m(x) of T (and hence 
also of A). Note that by Theorem 7.5 we have A(A^ - 3A + 51) = 51, and 
hence A"' = (l/5)(A^ -3A + 51) or 



A-..i 

5 



( 2 1 
-1 2 

\ 1 -2 



2\ 
4 

1/ 



Exercises 

1. Show that Nt is an ideal of ^[x]. 

2. For each of the following linear operators T E. L(V), find the minimal 
polynomial of each of the (standard) basis vectors for V, find the minimal 
polynomial of T, and find a vector whose minimal polynomial is the same 
as that of T: 

(a) V = R^: Tei = 62, Te2 = ei + e2. 



7.4 ANNIHILATORS 



329 



(b) V = IR^: Tei = 2ei - 3e2, Te2 = ei + 5e2. 

(c) V = IR'^: Tei = ei - 62 + 63, Te2 = -2e2 + 5e3, Tes = 2ei + 3e2. 

(d) V = IR^: Tei = 2e2, Te2 = 2ei, Tes = 2e3. 

(e) V = IR"^: Tei = ei + es, Te2 = 3e4, Tes = ei - es, Te4 = e2. 

(f ) V = IR"^: Tei = ei + e2, Te2 = 62 - es, Tes = es + e4, Te4 = ei - e4. 

3. For each of the following matrices, find its minimal polynomial over IR, 
and then find its inverse if it exists: 





(1 


-1^ 








( 4 


-1^ 








(a) 


.1 








ib) 




\ 
























^0 





\\ 






{ ^ 





0] 






(c) 


1 





-2 




id) 


-1 


3 













1 








. 5 


2 


\ 








l\ 


1 


0' 




' 1 







1 


0\ 




4 


1 










1 





1 


id) 










ie) 


















-3 


-1 


1 










.0 





1 


4j 




. 







1 


1. 



7.5 INVARIANT SUBSPACES 

Recall that two matrices A, B G Mn(i70 are said to be similar if there exists a 
nonsingular matrix P G MnCO such that B = P"' AP. As was shown in Exer- 
cise 5.4.1, this defines an equivalence relation on MnC^. Since L(V) and 
MnCn are isomorphic (Theorem 5.13), this definition applies equally well to 
linear operators. We call the equivalence class of a matrix (or linear operator) 
defined by this similarity relation its similarity class. While Theorem 7.14 
gave us a condition under which a matrix may be diagonalized, this form is 
not possible to achieve in general. The approach we shall now take is to look 
for a basis in which the matrix of some linear operator in a given similarity 
class has a particularly simple standard form. As mentioned at the beginning 
of this chapter, these representations are called canonical forms. Of the many 
possible canonical forms, we shall consider only several of the more useful 
and important forms in this book. We begin with a discussion of some addi- 
tional types of subspaces. A complete discussion of canonical forms under 
similarity is given in Chapter 8. 
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Suppose T e L(V) and let W be a subspace of V. Then W is said to be 
invariant under T (or simply T-invariant) if T(w) G W for every w G W. 

For example, if V = R'^ then the xy-plane is invariant under the linear transfor- 
mation that rotates every vector in about the z-axis. As another example, 
note that if v G V is an eigenvector of T, then T(v) = Xv for some X G ^, and 
hence v generates a one-dimensional subspace of V that is invariant under T 
(this is not necessarily the same as the eigenspace of A,). 

Another way to describe the invariance of W under T is to say that 
T(W) C W. Then clearly t2(W) = T(T(W)) C W, and in general T°(W) C W 
for every n = 1, 2, . . . . Since W is a subspace of V, this means that 
f(T)(W) C W for any f(x) G iF[x]. In other words, if W is invariant under T, 
then W is also invariant under any polynomial in T (over the same field as W). 

If W C V is T-invariant, we may define the restriction of T to W in the 
usual way as that operator T|W: W ^ W defined by (T|W)(w) = T(w) for 
every w G W. We will frequently write T w instead of T|W. 

Theorem 7.17 Suppose T G L(V) and let W be a T-invariant subspace of V. 
Then 

(a) f(Tw)(w) = f(T)(w) for any f(x) G J[x] and w G W. 

(b) The minimal polynomial m'(x) for Tw divides the minimal polyno- 
mial m(x) for T. 

Proof This is Exercise 7.5.2. I 

If T G L(V) and f(x) G iF[x], then f(T) is also a linear operator on V, and 
hence we may define the kernel (or null space) of f(T) in the usual way by 

Ker f(T) = {v G V: f(T)(v) = 0} . 

Theorem 7.18 If T G L(V) and f(x) G J[x], then Ker f(T) is a T-invariant 
subspace of V. 

Proof Recall from Section 5.2 that Ker f(T) is a subspace of V. To show that 
Ker f(T) is T-invariant, we must show that Tv G Kerf(T) for any v G 
Ker f(T), i.e., f(T)(Tv) = 0. But using Theorem 7.2(a) we see that 

f(T)(Tv) = T(f(T)(v)) = T(0) = 

as desired. I 

Now suppose T G L(V) and let W C V be a T-invariant subspace. 
Furthermore let {vi, V2, . . . , Vn} be a basis for V, where the first m < n vec- 
tors form a basis for W. If A = (ay) is the matrix representation of T relative to 
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this basis for V, then a little thought should convince you that A must be of 
the block matrix form 

'B C\ 
.0 D) 



A = 



where ajj = for j < m and i > m. This is because T(w) G W and any w G W 
has components (wi, . . . , Wm, 0, . . . , 0) relative to the above basis for V. The 
formal proof of this fact is given in the following theorem. 

Theorem 7.19 Let W be a subspace of V and suppose T G L(V). Then W is 
T-invariant if and only if T can be represented in the block matrix form 



/ 



A = 



B C) 

d\ 



where B is a matrix representation of Tw- 

Proof First suppose that W is T-invariant. Choose a basis {vi, . . . , Vm} for 
W, and extend this to a basis {vi, . . . , Vm, Vm+i , • • • , Vn} for V (see Theorem 
2.10). Then, since T(Vi) G W for each i = 1, . . . , m, there exist scalars by such 
that 

Tw(Vi) = T(Vi) = Vibii + • • • + Vmbmi 

for each i = 1, . . . , m. In addition, since T(Vi) G V for each i = m + 1, . . . , n, 
there also exist scalars cij and dy such that 

T(Vi) = ViCii + • • • + VmCmi + Vm+ldm+1 !+•••+ Vndni 

for each i = m + 1, . . . , n. 

From Theorem 5. 1 1, we see that the matrix representation of T is given by 
an n X n matrix A of the form 

(B C\ 



A = 



\0 ^/ 



where B is an m x m matrix that represents Tw, C is an m x (n - m) matrix, 
and D is an (n - m) x (n - m) matrix. 

Conversely, if A has the stated form and {v,, . . . , Vn} is a basis for V, 
then the subspace W of V defined by vectors of the form 



w 
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where each £ ^ will be invariant under T. Indeed, for each i = 1, . . . , m we 
have 

n 

and hence T(w) £ W. ■ 

Corollary Suppose T G L(V) and W is a T-invariant subspace of V. Then the 
characteristic polynomial of Tw divides the characteristic polynomial of T. 

Proof See Exercise 7.5.3. I 

Recall from Theorem 2.18 that the orthogonal complement W"^ of a set 
W C V is a subspace of V. If W is a subspace of V and both W and W"^ are T- 
invariant, then since V = W © W"^ (Theorem 2.22), a little more thought 
should convince you that the matrix representation of T will now be of the 
block diagonal form 

(B 0\ 



A = 



\0 



We now proceed to discuss a variation of Theorem 7. 19 in which we take into 
account the case where V can be decomposed into a direct sum of subspaces. 

Let us assume that V = Wj © • • • ® Wr where each Wi is a T-invariant 
subspace of V. Then we define the restriction of T to Wi to be the operator 
Ti = Twi = T|Wi. In other words, Ti(Wi) = T(Wi) G Wi for any Wi G Wi. Given 
any v G V we have v = Vi + • • • + Vr where Vi G Wi for each i = 1, . . . , r, and 
hence 

T{v) = ^T{v,) = ^T,{v,) . 

i=\ i=\ 

This shows that T is completely determined by the effect of each Ti on Wi. In 
this case we call T the direct sum of the Ti and we write 



T = Ti © • • • © Tr . 



We also say that T is reducible (or decomposable) into the operators Ti, and 
the spaces Wi are said to reduce T, or to form a T-invariant direct sum 
decomposition of V. In other words, T is reducible if there exists a basis for 
V such that V = W, © • • • © Wr and each Wi is T-invariant. 

For each i = 1, . . . , r we let Bi = {Wii , . . . , WinJ be a basis for Wi so that 
B = UBi is a basis for V = Wi © • • • © Wr (Theorem 2.15). We also let Ai = 
(^i, kj) be the matrix representation of Ti with respect to the basis Bi (where k 
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and j label the rows and columns respectively of the matrix Aj). Therefore we 
see that 

k=i 

where i = 1, . . . , r and j = 1, . . . , ni . If A is the matrix representation of T 
with respect to the basis B = {w„, . . . , Wm, , . . . , Wri , . . . , w j-n,} for V, then 
since the ith column of A is just the image of the ith basis vector under T, we 
see that A must be of the block diagonal form 

'A^ ••• 0' 
A2 ••• 

.0 - 

If this is not immediately clear, then a minute's thought should help, keeping 
in mind that each Aj is an Uj x Uj matrix, and A is an n x n matrix where n = 
2i=ini. It is also helpful to think of the elements of B as being numbered from 
1 to n rather than by the confusing double subscripts (also refer to the proof of 
Theorem 7.19). 

The matrix A is called the direct sum of the matrices Ai, . . . , Ar and we 
write 

A = Ai © • • • © Ar . 

In this case we also say that the matrix A is reducible. Thus a representation 
[T] of T is reducible if there exists a basis for V in which [T] is block diago- 
nal. (Some authors say that a representation is reducible if there exists a basis 
for V in which the matrix of T is triangular. In this case, if there exists a basis 
for V in which the matrix is block diagonal, then the representation is said to 
be completely reducible. We shall not follow this convention.) This discus- 
sion proves the following theorem. 

Theorem 7.20 Suppose T e L(V) and assume that V = Wi © • • • © Wr 
where each Wi is T-invariant. If Aj is the matrix representation of Tj = T|Wi , 
then the matrix representation of T is given by the matrix A = Ai © • • • © Ar. 

Corollary Suppose T G L(V) and V = W, ® • • • ® Wr where each Wi is T- 
invariant. If At(x) is the characteristic polynomial for T and Ai(x) is the char- 
acteristic polynomial for Tj = T|Wi, then At(x) = Ai(x) • • • Ar(x). 

Proof See Exercise 7.5.4. I 
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Example 7.8 Referring to Example 2.8, consider the space V = IR^. We write 
V = Wi © W2 where Wi = (the xy-plane) and W2 = (the z-axis). Note 
that Wi has basis vectors Wu = (1, 0, 0) and W12 = (0, 1, 0), and W2 has basis 
vector W21 = (0, 0, 1). 

Now let T G L(V) be the linear operator that rotates any v G V counter- 
clockwise by an angle 6 about the z-axis. Then clearly both W, and W2 are 

T-invariant. Letting {Ci} be the standard basis for R^, we have Tj = T|Wi and 
consequently (see Example 1.2), 

T^(e^) = T(e^) = (cos0)ei + (sin 0)^2 
^1(^2) = ^(^2) = (-sin0)ej + (cos 0)^2 
72(63) = 7(^3) = 63 

Thus V = W, ® W2 is a T-invariant direct sum decomposition of V, and T is 
the direct sum of Ti and T2. It should be clear that the matrix representation of 
T is given by 

fcosd -sine 0' 
sin0 COS0 

which is just the direct sum of the matrix representations of Ti and T2. / 
Exercises 

1. Suppose V = Wi © W2 and let T,: Wi ^ V and T2: W2 ^ V be linear. 
Show that T = Ti © T2 is linear. 

2. Prove Theorem 7. 17. 

3. Prove the corollary to Theorem 7. 19. 

4. Prove the corollary to Theorem 7.20. 

5. Let V be a finite-dimensional inner product space over C, and let G be a 
finite group. If for each g G G there is a U(g) G L(V) such that 

U(gi)U(g2)=U(gig2) 

then the collection U(G) = {U(g)} is said to form a representation of G. 
If W is a subspace of V with the property that U(g)(W) C W for all g G 
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G, then we will say that W is U(G)-in variant (or simply invariant). 

Furthermore, we say that U(G) is irreducible if there is no nontrivial 
U(G)-invariant subspace (i.e., the only invariant subspaces are {0} and V 
itself ). 

(a) Prove Schur's lemma 1: Let U(G) be an irreducible representation of 
G on V. If A e L(V) is such that AU(g) = U(g)A for all g e G, then A = 
Xl where X, £ C. [Hint: Let X, be an eigenvalue of A with corresponding 
eigenspace \x- Show that Vx is U(G)-invariant.] 

(b) If S e L(V) is nonsingular, show that U'(G) = SU(G)S"' is also a 
representation of G on V. (Two representations of G related by such a 
similarity transformation are said to be equivalent.) 

(c) Prove Schur's lemma 2: Let U(G) and U'(G) be two irreducible rep- 
resentations of G on V and V respectively, and suppose A £ L(V', V) is 
such that AU'(g) = U(g)A for all g G G. Then either A = 0, or else A is an 
isomorphism of V onto V so that A"' exists and U(G) is equivalent to 
U'(G). [Hint: Show that Im A is invariant under U(G), and that Ker A is 
invariant under U'(G).] 

Suppose A G MnClT) has minimal polynomial mA and B G MmClT) has 
minimal polynomial m^. Let m^^g be the minimal polynomial for A © 
B and let p = lcm{mA, ms}. Prove m^^g = p. 

Let W be a T-invariant subspace of a finite-dimensional vector space V 
over and suppose v E V. Define the set 

Nt(v,w) = {fe:r[x]:f(T)vew} . 

(a) Show that Nt(v, W) is an ideal of J[x]. This means that Nt(v, W) 
has a unique monic generator Cv(x) which is called the T-conductor of v 
into W. 

(b) Show that every T-conductor divides the minimal polynomial m(x) 
forT. 

(c) Now suppose the minimal polynomial for T is of the form 

m(x) = (x - Xi)°' • • • (x - Xr)°' 

and let W be a proper T-invariant subspace of V. Prove there exists a 
vector V G V with v ^ W such that (T - Xl)v G W where X is an eigen- 
value of T. [Hint: Suppose Vi G V with Vi ^ W. Show that the T-conduc- 
tor Cvi of Vi into W is of the form Cvi(x) = (x - X)d(x). Now consider the 
vector V = d(T)vi.] 
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8. Let V be finite-dimensional over ^ and suppose T £ L(V). Prove there 
exists a basis for V in which the matrix representation A of T is upper- 
triangular if and only if the minimal polynomial for T is of the form 
m(x) = (x - X,)"' • • • (x - X r)""^ where each Uj G Z"^ and the Xj are the 
eigenvalues of T. [Hint: Apply part (c) of the previous problem to the 
basis Vi, . . . , Vn in which A is upper-triangular. Start with W = {0} to get 
vi, then consider the span of vi to get V2, and continue this process.] 

9. Relative to the standard basis for IR^, let T e L(IR^) be represented by 



(a) Prove that the only T-invariant subspaces of R are {0} and IR itself. 

(b) Suppose U G L(C^) is also represented by A. Show that there exist 
one-dimensional U-invariant subspaces. 

10. Find all invariant subspaces over IR of the operator represented by 



11. (a) Suppose T G L(V), and let v G V be arbitrary. Define the set of 
vectors 



Show that Z(v, T) is a T-invariant subspace of V. (This is called the T- 
cyclic subspace generated by v.) 

(b) Let V have minimal polynomial mv(x) = x' + ar-ix'"^ + ■ ■ ■ + sl^x + 
ao . Prove that Z(v, T) has a basis {v, Tv, . . . , T'"V}, and hence also that 
dim Z(v, T) = deg mv(x). [Hint: Show that T'^v is a linear combination of 
{v, Tv, . . . , T'"V} for every integer k > r.] 

(c) Let T G L(IR^) be represented in the standard basis by 





Z(v,T) = {f(T)(v):fG:F[x]} . 



(I 



A = 



1 



1 -1 



1 -2 



2 
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If V = ei - 62 , find the minimal polynomial of v and a basis for Z(v, T). 

Extend this to a basis for IR^, and show that the matrix of T relative to this 
basis is block triangular. 

7.6 THE PRIMARY DECOMPOSITION THEOREM 

We now proceed to show that there exists an important relationship between 
the minimal polynomial of a linear operator T G L(V) and the conditions 
under which V can be written as a direct sum of T-invariant subspaces of V. 
In the present context, this theorem (the primary decomposition theorem) is 
best obtained by first proving two relatively simple preliminary results. 

Theorem 7.21 Suppose T G L(V) and assume that f G ^[x] is a polynomial 

such that f(T) = and f = hjhj where h, and h2 are relatively prime. Define Wi 
= Ker h](T) and W2 = Ker hjiT). Then Wi and W2 are T-invariant subspaces, 
and V = Wi © W2. 

Proof We first note that Wi and W2 are T-invariant subspaces according to 
Theorem 7.18. Next, since hi and h2 are relatively prime, there exist polyno- 
mials gi and g2 such that gihi + g2h2 = 1 (Theorem 6.5), and hence 

g,(T)h,(T) + 8,(T)h,(T) = l . (*) 

Then for any v G V we have 

g,(T)h,(T)(v) + g2(T)h^(T)(v) = v . 

However 

h^{T)g,{T)h,{T){v) = g,{T)h,{T)h^{T){v) 
= gy{T)f{T){v) 

= 

so that gi(T)hi(T)(v) e Ker h2(T) = W2. Similarly, it is easy to see that 
g2(T)h2(T)(v) G Wi, and hence v G Wi + W2. This shows that V = Wi + W2, 
and it remains to be shown that this sum is direct. 

To show that this sum is direct we use Theorem 2.14. In other words, if 
V = Wi + W2 where Wj G Wi, then we must show that each Wi is uniquely deter- 
mined by V. Applying gi(T)hi(T) to v = Wi + W2 and using the fact that 
hi(T)(wi) = we obtain 
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gi(T)hi(T)(v) = gi(T)hi(T)(w2) . 

Next, we apply (*) to W2 and use the fact that h2(T)(w2) = to obtain 

gi(T)hi(T)(w2) = W2 . 

Combining these last two equations shows that W2 = gi(T)hi(T)(v), and thus 
W2 is uniquely determined by v (through the action of gi(T)hi(T)). We leave it 
to the reader to show in a similar manner that Wi = g2(T)h2(T)(v). Therefore 
V = Wi + W2 is a unique decomposition, and hence V = Wi © W2. ■ 

Theorem 7.22 Suppose T e L(V), and let f = hih2 e J[x] be the (monic) 
minimal polynomial for T, where hi and h2 are relatively prime. If Wi = 
Ker hi(T), then hj is the (monic) minimal polynomial for Tj = T|Wi. 

Proof For each i = 1, 2 let mj be the (monic) minimal polynomial for Tj. 
Since Wi = Ker hi(T), Theorem 7.17(a) tells us that hi(Ti) = 0, and therefore 
(by Theorem 7.4) we must have mi|hi. This means that mi|f (since f = hih2), 
and hence f is a multiple of mi and m2. From the definition of least common 
multiple, it follows that the 1cm of mi and m2 must divide f. Since hi and h2 
are relatively prime, m, and m2 must also be relatively prime (because if mi 
and m2 had a common factor, then h, and h^ would each also have this same 
common factor since milhi). But mi and m2 are monic, and hence their greatest 
common divisor is 1. Therefore the 1cm of mi and m2 is just mim2 (Theorem 
6.9). This shows that mim2|f. 

On the other hand, since V = Wi © W2 (Theorem 7.21), we see that for 
any v £ V 

[m,{T)m^{T)\{v) = [m,{T)m^{T)\{w, + w^) 

= m^{T)[m, (T ){w, )] + m^{T)[m^ {T){w^ )] 
= m2 (r ) [ mi (Ti )( Wi )] + mi (r ) [ m2 (Tj )( W2 )] 
= 

because mi is the minimal polynomial for Tj. Therefore (mim2)(T) = 0, and 
hence f|mim2 (from Theorem 7.4 since, by hypothesis, f is the minimal poly- 
nomial for T). Combined with the fact that m,m2|f, this shows that f = m]m2 
(since mi, m2 and f are monic). This result, along with the definition f = hih2, 
the fact that hi and h2 are monic, and the fact that mi|hi shows that m^ = hj. I 

We are now in a position to prove the main result of this section. 
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Theorem 7.23 (Primary Decomposition Theorem) Suppose T e L(V) has 
minimal polynomial 

m(x) = fi(x)°i f2(x)°2 . . . fr(x)»' 

where each fi(x) is a distinct monic prime polynomial and each nj is a positive 
integer. Let Wi = Ker fi(T)"', and define Tj = T|Wi. Then V is the direct sum 
of the T-invariant subspaces Wi, and fi(x)"' is the minimal polynomial for Tj. 

Proof If r = 1 the theorem is trivial since Wi = Ker fi(T)°' = Ker m(T) = V. 
We now assume that the theorem has been proved for some r - 1 > 1, and 
proceed by induction to show that it is true for r. We first remark that the Wj 
are T-invariant subspaces by Theorem 7.18. Define the T-invariant subspace 

U = Ker[f2(T)"2 . . . f^CT)"--] . 

Because the fi(x)"' are relatively prime (by Corollary 2 of Theorem 6.5, since 
the fi(x) are all distinct primes), we can apply Theorem 7.21 to write V = 
Wi © U. In addition, since m(x) is the mini-mal polynomial for T, Theorem 

7.22 tells us that fi(x)°' is the minimal polynomial for Ti, and [fii^Y^ • • • 
fr(x)"f] is the minimal polynomial for Tu = T|U. 

Applying our induction hypothesis, we find that U = W2 © • • • © Wr 
where for each i = 2, . . . , r we have Wi = Ker fi(Tu)"', and fi(x)"' is the 
minimal polynomial for Tj = TulWi. However, it is obvious that fi(x)"' divides 

[f2(x)"2 • • • fr(x)n^] for each i = 2, . . . , r and hence Ker fi(T)"' C U. 
Specifically, this means that the set of all vectors v G V with the property that 
f,(T)"'(v) = are also in U, and therefore Ker f,(T)"' = Ker fi(Tu)"' = Wj. 
Furthermore, T|Wi = TylWi = Tj and thus fi(x)"' is also the minimal polyno- 
mial for T|Wi. 

Summarizing, we have shown that V = W, © U = W, © W2 © • • • © Wr 

where Wi = Ker fi(T)"' for each i = 1, . . . , r and fi(x)"' is the minimal polyno- 
mial for T|Wi = Tj. This completes the induction procedure and proves the 
theorem. I 

In order to make this result somewhat more transparent, as well as in aid- 
ing actual calculations, we go back and look carefully at what we have done in 

defining the spaces Wi = Ker fi(T)"'. For each i = 1, . . . , r we define the poly- 
nomials gi(x) by 

m(x) = fi(x)"'gi(x) . 
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In other words, gi(x) is a product of the r - 1 factors f|(x)°j with j ^ i. We 
claim that in fact 

Wi = gi(T)(V) . 

It is easy to see that gi(T)(V) C Wi because 

fi(T)"'[gi(T)(v)] = K(T)"'gi(T)](v) = m(T)(v) = 

for every v £ V. On the other hand, fi(x)°' and gi(x) are monic relative primes, 
and hence (by Theorem 6.5) there exist polynomials a(x), b(x) £ ^[x] such 

that a(x)fi(x)°i + b(x)gi(x) = 1. Then for any V; e W; = Ker fi(T)°i we have 

fi(T)°i(Vi) = 0, and hence 

Vi = a(T)[fi(T)"i(vO]+gi(T)[b(T)(vO] = + gi(T)[b(T)(vO] e gi(T)(V) . 

Hence Wi C gi(T)(V), and therefore Wi = gi(T)(V) as claimed. This gives us, 
at least conceptually, a practical method for computing the matrix of a linear 
transformation T with respect to the bases of the T-invariant subspaces. 

As a final remark, note that for any j ?t i we have gi(T)(Wj) = because 
gi(T) contains fj(T)"j and Wj = Ker fj(T)"j. In addition, since Wi is T-invariant 
we see that gi(T)(Wi) C Wi. But we also have 

w, = a{T)U,{TY> (w.)] + g,(r)[Z7(r)(w,)] 
= o + g,(r)[&(r)(w,)] 

and hence gi(T)(Wi) = Wi. This should not be surprising for the following 
reason. If we write 

Vi = Wi © • • • © Wi-i © Wi+1 © • • • © Wr 

then gi(T)(Vi) = 0, and therefore 

Wi = gi(T)(V) = gi(T)(Wi©Vi) = gi(T)(Wi) . 

Example 7.9 Consider the space V = R'^ with basis {ui, U2, U3} and define 
the linear transformation T £ L(V) by 
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r(Mj) = U2 

T(U2) = M3 

T(u^) = -2u^ + 3u2 . 

Then 

T2(Ui) = T(U,) = U3 

and hence 

T\ui) = T(U3) = -2ui + 3u2 = -2ui + 3T(ui) . 

Therefore T^(ui) - 3T(ui) + 2ui = so that the minimal polynomial of Ui is 
given by 

mi(x) = - 3x + 2 = (x - lf(x + 2) . 

Now recall that the minimal polynomial m(x) for T is just the least common 
multiple of the minimal polynomials mi(x) of the basis vectors for V, and 
deg m(x) < dim V = n (see the discussion prior to Example 7.6). Since mi(x) is 
written as a product of prime polynomials with deg m, = 3 = dim V, it follows 
that m(x) = mi(x). We thus have fi(x)"i = (x - 1)^ and f2(x)"2 = (x + 2). 
We now define 

Wi = gi(T)(V) = (T + 2)(V) 

and 

W2 = g2(T)(V) = (T - 1)2(V) . 

A simple calculation shows that 

(T + 2)Mj = 2mj + U2 
(T + 2)^2 = 2^2 + M3 

(T + 2)u^ = -2mj + 3m2 + 2M3 = (T + 2)(-Mj + 2M2) • 

Therefore Wi is spanned by the basis vectors {2ui + U2, 2u2 + U3}. Similarly, it 
is easy to show that (T - l)^Ui = Ui - 2u2 + U3 and that both (T - 1)^U2 and 
(T - 1)^U3 are multiples of this. Hence {ui - 2u2 + U3} is the basis vector for 
W2. 

We now see that Ti = T|Wi and T2 = TIW2 yield the transformations 

T^{2u^+U2) = 2M2 + M3 

Ty{2u2 + M3) = -2mj + 3^2 + 2^3 = -(2mj + ^2) + 2(2m2 + M3) 
T2{u^ - 2u2 + M3) = -2(mj - 2^2 + M3) 

and hence Ti and T2 are represented by the matrices Ai and A2, respectively, 
given by 
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(0 -i\ 

1 2 



A2=(-2) 



Therefore T = Ti © is represented by the matrix A = Ai © where 



A = 








0-2 



From Corollary 1 of Theorem 7.14 we know that a matrix A E. MnCT) is 
diagonalizable if and only if A has n linearly independent eigenvectors, and 
from Theorem 7.13, the corresponding distinct eigenvalues X,, . . . , X r (where 
r < n) must be roots of the minimal polynomial for A. The factor theorem 
(corollary to Theorem 6.4) then says that x - is a factor of the minimal 
polynomial for A, and hence the minimal polynomial for A must contain at 
least r distinct linear factors if A is to be diagonalizable. 

We now show that the minimal polynomial of a diagonalizable linear 
transformation consists precisely of distinct linear factors. 

Theorem 7.24 A linear transformation T e L(V) is diagonalizable if and 
only if the minimal polynomial m(x) for T is of the form 

m(x) = (x - Xi) • • • (x - Xr) 
where Xi, . . . , Xr are the distinct eigenvalues of T. 

Proof Suppose that m(x) = (x - Xi) • • • (x - Xr) where Xi, . . . , Xr £ ^ are 
distinct. Then, according to Theorem 7.23, V = Wi © • • • © Wr where Wi = 
Ker(T - Xjl). But then for any w £ Wj we have 

= (T-Xil)(w) = T(w)-XiW 

and hence any w £ Wi is an eigenvector of T with eigenvalue Xi. It should be 
clear that any eigenvector of T with eigenvalue Xi is also in Wi. In particular, 
this means that any basis vector of Wi is also an eigenvector of T. By 
Theorem 2.15, the union of the bases of all the Wi is a basis for V, and hence 
V has a basis consisting entirely of eigenvectors. This means that T is diag- 
onalizable (Theorem 7.14). 

On the other hand, assume that T is diagonalizable, and hence V has a 
basis {vi, . . . , Vn} of eigenvectors of T that correspond to the (not necessarily 
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distinct) eigenvalues ki, . . . , Xn. If the Vi are numbered so that Ki, . . . ,'kr are 
the distinct eigenvalues of T, then the operator 

f(T) = (T-XiD • • • (T-Xrl) 

has the property that f(T)(Vi) = for each of the basis eigenvectors Vi, . . . , Vn. 
Thus f(T) = and the minimal polynomial m(x) for T must divide f(x) 
(Theorem 7.4). While this shows that m(x) must consist of linear factors, it is 
in fact also true that f(x) = m(x). To see this, suppose that we delete any factor 
T - Xal from f(T) to obtain a new linear operator f'(T). But the A,i are all dis- 
tinct so that f (T)(va) ^ 0, and hence f (T) ^ 0. Therefore f'(x) cannot be the 
minimal polynomial for T, and we must have f(x) = m(x). I 

In a manner similar to that used in Corollary 1 of Theorem 7.14, we can 
rephrase Theorem 7.24 in terms of matrices as follows. 

Corollary 1 A matrix A G Mn(^ is similar to a diagonal matrix D if and 
only if the minimal polynomial for A has the form 

m(x) = (x - Xi) • • • (x - Xr) 

where Xi, . . . , Xr G ^ are all distinct. If this is the case, then D = P"' AP where 
P is the invertible matrix whose columns are any set of n linearly independent 
eigenvectors Vi, . . . , Vn of A corresponding to the eigenvalues Xi, . . . , Xn. (If 
r < n, then some of the eigenvalues will be repeated.) In addition, the diagonal 
elements of D are just the eigenvalues Xi, . . . , Xn. 

Corollary 2 A linear transformation T G L(V) is diagonalizable if and only 
if V = Wi © • • • © Wr where W; = Ker(T - X;!) = Vxi- 

Proof Recall that Vx; is the eigenspace corresponding to the eigenvalue Xj, 
and the fact that Vx; = Ker(T - Xjl) was shown in the proof of Theorem 7.24. 
If T is diagonalizable, then the conclusion that V = Wi © • • • © Wr follows 
directly from Theorems 7.24 and 7.23. On the other hand, if V = Wi © • • • © 
Wr then each Wj has a basis of eigenvectors and hence so does V (Theorem 
2.15). I 

It is important to realize that one eigenvalue can correspond to more than 
one linearly independent eigenvector (recall the comment following Theorem 
7.8). This is why the spaces Wi in the first part of the proof of Theorem 7.24 
can have bases consisting of more than one eigenvector. In particular, any 
eigenvalue of multiplicity greater than one can result in an eigenspace of 
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dimension greater than one, a result that we treat in more detail in the next 
section. 

Exercises 

1. Write each of the following linear transformations T G L(V) as a direct 
sum of linear transformations whose minimal polynomials are powers of 
prime polynomials: 

(a) V = R^: Tei = 62 , T&2 = 3ei + 2e 2 • 

(b) V = R^: Tei = -4ei + 4e2 , Te2 = 62 . 

(c) V = R^: Tei = 62 , Te2 = 63 , Tes = 2ei - e2 + 2e3 . 

(d) V = R^: Tc] = 3ei , Te2 = 62 - 63 , Te3 = ei + 3e2 . 

(e) V = R'^: Tei = 3ei + e2 , Te2 = 62 - 5e3 , Te3 = 2ei + 2e2 + 2e3 . 

2. Let V be finite-dimensional and suppose T G L(V) has minimal poly- 
nomial m(x) = fi(x)°' • • • frCx)""^ where the fi(x) are distinct monic primes 
and each ni G Z Show that the characteristic polynomial is of the form 

A(X) = fi(x)dl . . . f^xfr 

where 

di = dim(Ker fi(T)°0/deg fi . 

3. Let © = {Tj} be a collection of mutually commuting (i.e., TjTj = TjTj for 
all i, j) diagonalizable linear operators on a finite-dimensional space V. 
Prove that there exists a basis for V relative to which the matrix represen- 
tation of each Ti is diagonal. [Hint: Proceed by induction on dim V. Let 
T G fZ> (T ;t cl) have distinct eigenvalues X,, . . . , Xr and for each i define 
Wj = Ker(T - XJ). Show that Wi is invariant under each operator that 
commutes with T. Define ©i = {TjlWi: Tj G ©} and show that every 
member of ©i is diagonalizable.] 

7.7 MORE ON DIAGONALIZATION 

If an operator T G L(V) is diagonalizable, then in a (suitably numbered) basis 
of eigenvectors, its matrix representation A will take the form 
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%Im, - ^ 

v., 0-0 

A= . . . . . 

^0 - Kim^, 

where each is repeated nii times and Inu is the nii x nii identity matrix. Note 
that m, + • • • + mr must be equal to dim V. Thus the characteristic polynomial 
for T has the form 

At(x) = det(xl - A) = (x - Xi)""! • • • (x - Xr)"""^ 

which is a product of (possibly repeated) linear factors. That the characteristic 
polynomial of a diagonalizable operator is of this form also follows directly 
from Theorems 7.24 and 7.12. However, we stress that just because the char- 
acteristic polynomial factors into a product of linear terms does not mean that 
the operator is diagonalizable. We now investigate the conditions that deter- 
mine just when an operator will be diagonalizable. 

Let us assume that T is diagonalizable, and hence that the characteristic 
polynomial factors into linear terms. For each distinct eigenvalue A^, we have 
seen that the corresponding eigenspace Vx; is just Ker(T - Xil). Relative to a 
basis of eigenvectors, the matrix [T - Xjl] is diagonal with precisely m^ zeros 
along its main diagonal (just look at the matrix A shown above and subtract 
off Xil). From Theorem 5.15 we know that the rank of a linear transformation 
is the same as the rank of its matrix representation, and hence r(T - Xil) is just 
the number of remaining nonzero rows in [T - Xil] which is dim V - mi (see 
Theorem 3.9). But from Theorem 5.6 we then see that 

dim V)^^ = dim Ker(f - A,. 1) = nul(r - A,. 1) = dim V - r(r - A,. 1) 
= m,. . 

In other words, if T is diagonalizable, then the dimension of each eigenspace 
is just the multiplicity of the eigenvalue Xj. Let us clarify this in terms of 
some common terminology. In so doing, we will also repeat this conclusion 
from a slightly different viewpoint. 

Given a linear operator T £ L(V), what we have called the multiplicity of 
an eigenvalue X is the largest positive integer m such that (x - X)"" divides the 
characteristic polynomial At(x). This is properly called the algebraic multi- 
plicity of X, in contrast to the geometric multiplicity which is the number of 
linearly independent eigenvectors belonging to that eigenvalue. In other 
words, the geometric multiplicity of X is the dimension of Vx . In general, we 
will use the word "multiplicity" to mean the algebraic multiplicity. The set of 
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all eigenvalues of a linear operator T £ L(V) is called the spectrum of T. If 
some eigenvalue in the spectrum of T is of algebraic multiplicity > 1, then the 
spectrum is said to be degenerate. 

If T £ L(V) has an eigenvalue X of algebraic multiplicity m, then it is not 
hard for us to show that the dimension of the eigenspace Vx must be less than 
or equal to m. Note that since every element of Vx is an eigenvector of T with 
eigenvalue X, the space Vx must be a T-invariant subspace of V. Furthermore, 
every basis for Vx will obviously consist of eigenvectors corresponding to X. 

Theorem 7.25 Let T G L(V) have eigenvalue X. Then the geometric multi- 
plicity of A, is always less than or equal to its algebraic multiplicity. In other 
words, if A, has algebraic multiplicity m, then dim Vx ^ m. 

Proof Suppose dim Vx = r and let {vi, . . . , Vr} be a basis for Vx- By 
Theorem 2.10, we extend this to a basis {vi, . . . , Vn} for V. Relative to this 
basis, T must have the matrix representation (see Theorem 7.19) 



Applying Theorem 4.14 and the fact that the determinant of a diagonal matrix 
is just the product of its (diagonal) elements, we see that the characteristic 
polynomial At(x) of T is given by 



which shows that (x - Xy divides At(x). Since by definition m is the largest 
positive integer such that (x - X,)™|At(x), it follows that r < m. I 

Note that a special case of this theorem arises when an eigenvalue is of 
(algebraic) multiplicity 1. In this case, it then follows that the geometric and 
algebraic multiplicities are necessarily equal. We now proceed to show just 
when this will be true in general. Recall that any polynomial over an alge- 
braically closed field will factor into linear terms (Theorem 6.13). 




(x-A)/^ -C 
- D 



= det[(jc - A)/Jdet(jc/„_, - D) 
= (x-Xydet(xI„_,-D) 
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Theorem 7.26 Assume that T £ L(V) has a characteristic polynomial that 
factors into (not necessarily distinct) linear terms. Let T have distinct 
eigenvalues X,, . . . , Xr with (algebraic) multiplicities m,, . . . , mr respectively, 
and let dim V>^. = di. Then T is diagonalizable if and only if m^ = di for each 
i = 1, . . . ,r. 

Proof Let dim V = n. We note that since the characteristic polynomial of T is 
of degree n and factors into linear terms, it follows that mi + • • • + mr = n. We 
first assume that T is diagonalizable. By definition, this means that V has a 
basis consisting of n linearly independent eigenvectors of T. Since each of 
these basis eigenvectors must belong to at least one of the eigenspaces Vj^j, it 
follows that V = Vxi + • • • + Vx^ and consequently n < di + • • • + dr. From 
Theorem 7.25 we know that dj < m^ for each i = 1, . . . , r and hence 

n < di + • • • + dr ^ mi + • • • + mr = n 

which implies that di + • • • + dr = mi + • • • + mr or 

(mi - di) + • • • + (mr - dr) = . 

But each term in this equation is nonnegative (by Theorem 7.25), and hence 
we must have m^ = di for each i. 

Conversely, suppose that dj = mj for each i = 1, . . . , r. For each i, we 
know that any basis for W\. consists of linearly independent eigenvectors 
corresponding to the eigenvalue Xi, while by Theorem 7.8, we know that 
eigenvectors corresponding to distinct eigenvalues are linearly independent. 
Therefore the union 'B of the bases of {Vxj} forms a linearly independent set 
of di + • • • + dr = mi + • • • + mr vectors. But mi + • • • + mr = n = dim V, and 
hence !B forms a basis for V. Since this shows that V has a basis of eigenvec- 
tors of T, it follows by definition that T must be diagonalizable. I 

The following corollary is a repeat of Corollary 2 of Theorem 7.24. Its 
(very easy) proof may be based entirely on the material of this section. 

Corollary 1 An operator T £ L(V) is diagonalizable if and only if 

V = Wi © • • • © Wr 

where Wi, . . . , Wr are the eigenspaces corresponding to the distinct eigen- 
values of T. 

Proof This is Exercise 7.7.1. I 
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Using Theorem 5.6, we see that the geometric multiplicity of an eigen- 
value X is given by 

dim Vx = dim(Ker(T - XI)) = nul(T - XI) = dim V - r(T - XI) . 

This observation together with Theorem 7.26 proves the next corollary. 

Corollary 2 An operator T G L(V) whose characteristic polynomial factors 
into linear terms is diagonalizable if and only if the algebraic multiplicity of X 
is equal to dim V - r(T - XI) for each eigenvalue X. 

Example 7.10 Consider the operator T G L(R^) defined by 

T(x, y, z) = (9x + y, 9y, 7z) . 

Relative to the standard basis for R^, the matrix representation of T is given by 





(9 


1 


0\ 


A = 





9 













7. 



and hence the characteristic polynomial is 

Aa(x) = det(A - XI) = (9 - K)\7 - X) 
which is a product of linear factors. However, 





(0 


1 


0\ 


A -91 = 



















-2. 



which clearly has rank 2, and hence nul(T -9) = 3- 2=1 which is not the 
same as the algebraic multiplicity of X = 9. Thus T is not diagonalizable. / 

Example 7.11 Consider the operator on IR^ defined by the following matrix: 





( ^ 


-6 


-6\ 


A = 


-1 


4 


2 




. 3 


-6 


-4. 
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In order to avoid factoring a cubic polynomial, we compute the characteristic 
polynomial Aa(x) = det(xl - A) by applying Theorem 4.4 as follows (the 
reader should be able to see exactly what elementary row operations were 
performed in each step). 



x-5 6 


6 




x-2 


-x + 2 


1 x-4 


-2 




1 x-4 


-2 


-3 6 


Jc + 4 




-3 6 


Jc + 4 



= (x-2) 

= (x-2) 
= (x-2) 



1 -1 
1 x-4 -2 
-3 6 X + 4 

1 -1 
x-4 -1 
6 x + 1 
x-4 -1 
6 x + 1 



= (x-2)2(x-l) 



We now see that A has eigenvalue X, = 1 with (algebraic) multiplicity 1, and 
eigenvalue = 2 with (algebraic) multiplicity 2. From Theorem 7.25 we 
know that the algebraic and geometric multiplicities of Xi are necessarily the 
same and equal to 1, so we need only consider Xj. Observing that 





f ^ 


-6 


-6\ 


A-2/ = 


-1 


2 


2 




. 3 


-6 


-6. 



it is obvious that r(A - 21) = 1, and hence nul(A - 21) = 3 - 1 = 2. This shows 
that A is indeed diagonalizable. 

Let us now construct bases for the eigenspaces Wi = Vxj. This means that 

we seek vectors v = (x, y, z) £ IR^ such that (A - XJ^y = 0. This is easily 
solved by the usual row reduction techniques as follows. For X,i = 1 we have 







-6 


-6^ 




( 1 





-1^ 




h 





-1^ 




(I 


-1\ 


A-I = 


-1 


3 


2 




-1 


3 


2 







3 


1 







3 1 




. 3 


-6 


-5. 




. 3 


-6 


-5. 




.0 


-6 


-2 




.0 


0^ 
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which has the solutions x = z and y = -z/3 = -x/3. Therefore Wi is spanned by 
the single eigenvector Vi = (3, -1, 3). As to X2 = 2, we proceed in a similar 
manner to obtain 





f ^ 


-6 


-6^ 






-2 


-2\ 


A -2/ = 


-1 


2 


2 















. 3 


-6 


-6. 




.0 





0. 



which implies that any vector (x, y, z) with x = 2y + 2z will work. For exam- 
ple, we can let x = and y = 1 to obtain z = -1, and hence one basis vector for 
W2 is given by V2 = (0, 1, -1). If we let x = 1 and y = 0, then we have z = 1/2 
so that another independent basis vector for W2 is given by V3 = (2, 0, 1). In 
terms of these eigenvectors, the transformation matrix P that diagonalizes A is 
given by 

/ 3 2\ 



P = 



-1 1 
3 -1 



and we leave it to the reader to verify that AP = PD (i.e., P"'AP = D) where D 
is the diagonal matrix with diagonal elements du = 1 and d22 = dss = 2. 

Finally, we note that since A is diagonalizable, Theorems 7.12 and 7.24 
show that the minimal polynomial for A must be (x - l)(x - 2). / 



Exercises 

1. Prove Corollary 1 of Theorem 7.26. 

2. Show that two similar matrices A and B have the same eigenvalues, and 
these eigenvalues have the same geometric multiplicities. 

3. Let . . . , Xr £ ^ be distinct, and let D e MnCT) be diagonal with a 
characteristic polynomial of the form 

Ad(x) = (X-Xi/1---(X-Xr)'l' . 

Let V be the space of all n x n matrices B that commute with D, i.e., the 
set of all B such that BD = DB. Prove that dim V = di^ + • • • + dr^ . 

4. Relative to the standard basis, let T El L(IR'^) be represented by 
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'0 0' 
a 

A = 

^7 

^0 c 0^ 

Find conditions on a, b and c such that T is diagonalizable. 

5. Determine whether or not each of the following matrices is diagonaliz- 
able. If it is, find a nonsingular matrix P and a diagonal matrix D such 
that AP = D. 





(\ 1 


0] 






(2, 


-1 




2^ 




'-1 




1 




0\ 


(a) 


2 


2 




{b) 


2 







2 


(c) 







5 









^0 


3. 






.2 


-1 








. 4 




-2 








(-1 


-3 






(1 


-4 


0^ 






/O 







1^ 




id) 





5 


18 


ie) 


8 


-5 







(/) 


1 







-1 






. 


-2 


-7 




^6 


-6 


3. 






.0 


1 




\ 







( 3 


1 i\ 


ig) 


2 


4 2 






-1 h 



6. Determine whether or not each of the following operators T G L(R^) is 

diagonalizable. If it is, find an eigenvector basis for IR-^ such that [T] is 
diagonal. 

(a) T(x, y, z) = (-y, x, 3z). 

(b) T(x, y, z) = (8x + 2y - 2z, 3x + 3y - z, 24x + 8y - 6z). 

(c) T(x, y, z) = (4x + z, 2x + 3y + 2z, x + 4z). 

(d) T(x, y, z) = (-2y - 3z, x + 3y + 3z, z). 

7. Suppose a matrix A is diagonalizable. Prove that A™ is diagonalizable for 
any positive integer m. 

8. Summarize several of our results by proving the following theorem: 
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Let V be finite-dimensional, suppose T £ L(V) has distinct eigenvalues 
Xi, . . . , Xr , and let Wi = Ker(T - Xil). Then the following are equivalent: 

(a) T is diagonalizable. 

(b) At(x) = (x - Xi)™^ • • • (x - Xr)™' and Wi is of dimension m^ for each 
i = 1, . . . ,r. 

(c) dim Wi + • • • + dim Wr = dim V. 

9. Let V3 be the space of real polynomials of degree at most 3, and let f and 
f" denote the first and second derivatives of f £ V. Define T £ L(V3) by 
T(f) = f + f". Decide whether or not T is diagonalizable, and if it is, find 
a basis for V3 such that [T] is diagonal. 

10. (a) Let V2 be the space of real polynomials of degree at most 2, and 
define T E L(V2) by T(ax^ + bx + c) = cx^ + bx + a. Decide whether or 
not T is diagonalizable, and if it is, find a basis for V2 such that [T] is 
diagonal. 

(b) Repeat part (a) with T = (x + l)(d/dx). (See Exercise 7.3.17.) 
7.8 PROJECTIONS 

In this section we introduce the concept of projection operators and show how 
they may be related to direct sum decompositions where each of the subspaces 
in the direct sum is invariant under some linear operator. 

Suppose that U and W are subspaces of a vector space V with the property 
that V = U © W. Then every v E V has a unique representation of the form v 
= u + w where u E U and w E W (Theorem 2.14). We now define the 
mapping E: V ^ V by Ev = u. Note that E is well-defined since the direct 
sum decomposition is unique. Moreover, given any other v' E V = U © W 
with v' = u' + w', we know that v + v' = (u + u') + (w + w') and kv = ku + kw, 
and hence it is easy to see that E is in fact linear because 

E(v + v') = u + u' = Ev + Ev' 

and 

E(kv) = ku = k(Ev) . 

The linear operator E E L(V) is called the projection of V on U in the direc- 
tion of W. Furthermore, since any u E U C V may be written in the form u = 
u + 0, we also see that Eu = u and therefore 



E^v = E(Ev) = Eu = u = Ev 
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In other words, a projection operator E has the property that = E. By way 
of terminology, any operator T £ L(V) with the property that = T is said to 
be idempotent. 

On the other hand, given a vector space V, suppose we have an operator 

E e L(V) with the property that E^ = E. We claim that V = Im E ® Ker E. 
Indeed, first note that if u G Im E, then by definition this means there exists 

V G V with the property that Ev = u. It therefore follows that 

Eu = E(Ev) = E^v = Ev = u 

and thus Eu = u for any u G Im E. Conversely, the equation Eu = u obviously 
says that u G Im E, and hence we see that u G Im E if and only if Eu = u. 
Next, note that given any v G V we may clearly write 

V = Ev + V - Ev = Ev + (1 - E)v 

where by definition, Ev G Im E. Since 

E[(l - E)v] = (E - e2)v = (E - E)v = 

we see that (1 - E)v G Ker E, and hence V = Im E + Ker E. We claim that this 
sum is in fact direct. To see this, let w G Im E fl Ker E. Since w G Im E and 
E^ = E, we have seen that Ew = w, while the fact that w G Ker E means that 
Ew = 0. Therefore w = so that Im E fl Ker E = {0}, and hence 

V = Im E © Ker E . 

Since we have now shown that any v G V may be written in the unique form 

V = u + w with u G Im E and w G Ker E, it follows that Ev = Eu + Ew = u + 
= u so that E is the projection of V on Im E in the direction of Ker E. 

It is also of use to note that 

KerE = Im(l -E) 

and 

Ker(l -E) = ImE . 
To see this, suppose w G Ker E. Then 

w = Ew + (1 - E)w = (1 - E)w 
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which implies that w £ Im(l - E), and hence Ker E C Im(l - E). On the other 
hand, if w £ Im(l - E) then there exists w' £ V such that w = (1 - E)w' and 
hence 

Ew = (E - e2)w' = (E - E)w' = 

so that w £ Ker E. This shows that Im(l - E) C Ker E, and therefore Ker E = 
Im(l - E). The similar proof that Ker(l - E) = Im E is left as an exercise for 
the reader (Exercise 7.8.1). 

Theorem 7.27 Let V be a vector space with dim V = n, and suppose E G 
L(V) has rank k = dim(Im E). Then E is idempotent (i.e., E = E) if and only if 
any one of the following statements is true: 

(a) If V G Im E, then Ev = v. 

(b) V = Im E © Ker E and E is the projection of V on Im E in the direction of 
KerE. 

(c) Im E = Ker(l - E) and Ker E = Im(l - E). 

(d) It is possible to choose a basis for V such that [E] = Ik © On-k • 

Proof Suppose E = E. In view of the above discussion, all that remains is to 
prove part (d). Applying part (b), we let {ci, . . . , q^} be a basis for Im E and 
{ek+i, . . . , en} be a basis for Ker E. By part (a), we know that Eei = ei for i = 
1, . . . , k, and by definition of Ker E, we have Eei = for i = k + 1, . . . , n. But 
then [E] has the desired form since the ith column of [E] is just ECj. 

Conversely, suppose (a) is true and v G V is arbitrary. Then E v = E(Ev) = 
Ev implies that E = E. Now suppose that (b) is true and v G V. Then v = u + 
w where u G Im E and w G Ker E. Therefore Ev = Eu + Ew = Eu = u (by defi- 
nition of projection) and E v = E u = Eu = u so that E v = Ev for all v G V, 
and hence E = E. If (c) holds and v G V, then Ev G Im E = Ker(l - E) so that 
= (1 - E)Ev = Ev - E V and hence E v = Ev again. Similarly, (1 - E)v G 
Im(l - E) = Ker E so that = E(l - E)v = Ev - E v and hence E v = Ev. In 
either case, we have E = E. Finally, from the form of [E] given in (d), it is 
obvious that E = E. I 

It is also worth making the following observation. If we are given a vector 
space V and a subspace W C V, then there may be many subspaces U C V 

with the property that V = U © W. For example, the space R'^ is not necessar- 
ily represented by the usual orthogonal Cartesian coordinate system. Rather, it 
may be viewed as consisting of a line plus any (oblique) plane not containing 
the given line. However, in the particular case that V = W © W"^, then W"^ is 
uniquely specified by W (see Section 2.5). In this case, the projection E G 
L(V) defined by Ev = w with w G W is called the orthogonal projection of V 
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on W. In other words, E is an orthogonal projection if (Im E)-^ = Ker E. By the 
corollary to Theorem 2.22, this is equivalent to the requirement that 
(Ker E)-^ = Im E. 

It is not hard to generalize these results to the direct sum of more than two 
subspaces. Indeed, suppose that we have a vector space V such that V = Wi © 
• • • © Wr. Since any v £ V has the unique representation as v = Wi + • • • + Wr 
with Wi £ Wi, we may define for each j = 1, . . . , r the operator Ej £ L(V) by 
EjV = Wj. That each Ej is in fact linear is easily shown exactly as above for the 
simpler case. It should also be clear that Im Ej = Wj (see Exercise 7.8.2). If we 
write 

Wj =0 + -- - + + Wj+0 + -- - + 

as the unique expression for Wj £ W, C V, then we see that EjWj = Wj, and 
hence for any v £ V we have 

Ej-^v = Ej(EjV) = EjWj = Wj = EjV 

so that Ej^ = Ej. 

The representation of each Wj as EjV is very useful because we may write 
any v £ V as 

V = Wi + • • • + Wr = EjV + • • • + ErV = (Ej + • • • + Er)v 

and thus we see that Ej + • • • + Er = 1. Furthermore, since the image of Ej is 
Wj, it follows that if EjV = then Wj = 0, and hence 

KerEj = Wi © • • • © Wj-i © Wj+i © • • • © Wr . 

We then see that for any j = 1 , . . . , r we have V = Im Ej ® Ker Ej exactly as 
before. It is also easy to see that EiEj = if i 9^ j because Im Ej = Wj C Ker Ei. 

Theorem 7.28 Let V be a vector space, and suppose that V = Wi © • • • © 
Wr. Then for each j = 1, . . . , r there exists a linear operator Ej G L(V) with 
the following properties: 

(a) 1 = El + • • • + Er. 

(b) EiEj = Oifi;tj. 

(c) Ej2 = Ej. 

(d) ImEj =Wj. 

Conversely, if {Ej, . . . , Er} are linear operators on V that obey properties (a) 
and (b), then each Ej is idempotent and V = Wi © • • • © Wr where Wj = Im Ej. 
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Proof In view of the previous discussion, we only need to prove the converse 
statement. From (a) and (b) we see that 

which shows that each Ej is idempotent. Next, property (a) shows us that for 
any v £ V we have 

V = Iv = EiV + • • • + ErV 

and hence V = Wi + • • • + Wr where we have defined Wj = Im Ej. Now sup- 
pose that = Wi + • • • + Wr where each Wj G Wj. If we can show that this 
implies w, = • • • = Wr = 0, then any v G V will have a unique representation 
V = Vi + • • • + Vr with Vi G Wi. This is because if 

V = Vi + • • • + Vr = v/ + • • • + Vr' 

then 

(Vi - Vi') + • • • + (Vr - Vr') = 

would imply that Vi - Vi' = for each i, and thus Vi = Vi'. Hence it will follow 
that V = Wi © • • • © Wr (Theorem 2. 14). 

Since Wj + • • • + Wr = 0, it is obvious that Ej(wi + • • • + Wr) = 0. However, 
note that EjWi = if i ;t j (because Wi G Im Ej and EjEj = 0), while EjWj = Wj 

(since Wj = Ejw' for some w' G V and hence EjWj = Ej-^w' = Ejw' = Wj). This 
shows that Wi = • • • = Wr = as desired. I 

We now turn our attention to invariant direct sum decompositions, refer- 
ring to Section 7.5 for notation. We saw in Corollary 1 of Theorem 7.26 that a 
diagonalizable operator T G L(V) leads to a direct sum decomposition of V in 
terms of the eigenspaces of T. However, Theorem 7.28 shows us that such a 
decomposition should lead to a collection of projections on these eigenspaces. 
Our next theorem elaborates on this observation in detail. Before stating and 
proving this result however, let us take another look at a matrix that has been 
diagonalized. We observe that a diagonal matrix of the form 

v., - ^ 

0-0 

A= . . . 

^0 - 

can also be written as 
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(I 



mi 



(0 



0„ 



nil 



(0. 



+ • • • + A, 



If we define Ei to be the matrix obtained from A by setting X,i = 1 and A,j = 
for each j ^ i (i.e., the ith matrix in the above expression), then this may be 
written in the simple form 



where clearly 



A = XiEi + X2E2 + • • • + XrEr 
I = El + E2 + • • • + Er . 



Furthermore, it is easy to see that the matrices Ei have the property that 



and 



EiEj = ifiTij 



Ei^ = Ei 



With these observations in mind, we now prove this result in general. 

Theorem 7.29 If T £ L(V) is a diagonalizable operator with distinct eigen- 
values X,i, . . . , X,r, then there exist linear operators Ei, . . . , Er in L(V) such 
that: 

(a) 1 = E, + • • • + Er. 

(b) EiEj=Oifi^j. 

(c) T = XiEi + • • • + XrEr. 

(d) E/ = E,. 

(e) Im Ej = Wj where Wj = Ker(T - X, !) is the eigenspace corresponding 

Conversely, if there exist distinct scalars X,, . . . , Xr and distinct nonzero 
linear operators E,, . . . , Er satisfying properties (a), (b) and (c), then prop- 
erties (d) and (e) are also satisfied, and T is diagonalizable with Xi, . . . , Xr as 
its distinct eigenvalues. 



Proof First assume T is diagonalizable with distinct eigenvalues Xi, . . . , Xr 
and let Wi, . . . , Wr be the corresponding eigenspaces. By Corollary 1 of 
Theorem 7.26 we know that V = Wj © • • • © Wr. (Note that we do not base 
this result on Theorem 7.24, and hence the present theorem does not depend in 
any way on the primary decomposition theorem.) Then Theorem 7.28 shows 
the existence of the projection operators Ei, . . . , Er satisfying properties (a). 
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(b), (d) and (e). As to property (c), we see (by property (a)) that for any v £ V 
we have v = EiV + • • • + EfV. Since EjV G W,, we know from the definition of 
eigenspace that T(EjV) = X|(EjV), and therefore 

Tv^T{E^v) + --- + TiE^v) 
= \{E^v) + --- + \{E^v) 
= {\Ey + --- + \E^)v 

which verifies property (c). 

Now suppose that we are given a linear operator T G L(V) together with 
distinct scalars A-i, . . . , A-r and (nonzero) linear operators Ei, . . . , Er that obey 
properties (a), (b) and (c). Multiplying (a) by Ei and using (b) proves (d). Now 
multiply (c) from the right by Ej and use property (b) to obtain TEj = XiEj or 
(T - Xil)Ei = 0. If Wi G Im Ei is arbitrary, then Wi = EiWi' for some Wi' G V 
and hence (T - X4l)Wi = (T - Ail)EiWi' = which shows that Wi G Ker(T - 
X^l). Since Ei ^ 0, this shows the existence of a nonzero vector Wi G Ker(T - 
Xil) with the property that TWi = XiWi. This proves that each Xi is an 
eigenvalue of T. We claim that there are no other eigenvalues of T other than 
{Xi}. To see this, let a be any scalar and assume that (T - al)v = for some 
nonzero v G V. Using properties (a) and (c), we see that 

T - al = (Xi - a)Ei + • • • + (Xr - a)Er 

and hence letting both sides of this equation act on v yields 

= (Xi - a)EiV + • • • + (Xr - a)ErV . 

Multiplying this last equation from the left by Ei and using properties (b) and 
(d), we then see that (Xi - a)EiV = for every i = 1, . . . , r. Since v may be 
written as v = E,v + • • • + EfV, it must be true that EjV ^ for some j, and 
hence in this case we have Xj - a = or a = Xj. 

We must still show that T is diagonalizable, and that Im Ei = Ker(T - Xil). 
It was shown in the previous paragraph that any nonzero Wj G Im Ej satisfies 
Twj = XjWj, and hence any nonzero vector in the image of any Ei is an eigen- 
vector of Ei. Note this says that Im Ei C Ker(T - Xil). Using property (a), we 
see that any w G V may be written as w = EjW + • • • + Er w which shows that 
V is spanned by eigenvectors of T. But this is just what we mean when we say 
that T is diagonalizable. Finally, suppose Wi G Ker(T - Xil) is arbitrary. Then 
(T - Xil)Wi = and hence (exactly as we showed above) 



= (Xi - Xi)EiWi + • • • + (Xr - Xi)ErWi 
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Thus for each j = 1, . . . , r we have 

= (A^ - }^)EjWi 

which implies EjWj = for j ^ i. Since Wi = EiWi + • • • + ErWi while EjWj = 
for j i, we conclude that Wj = EjWi which shows that Wj £ Im Ej. In other 
words, we have also shown that Ker(T - X^l) CIm Ej. Together with our ear- 
lier result, this proves that Im Ej = Ker(T - Ail). I 



Exercises 

1. (a) Let E be an idempotent linear operator. Show that Ker(l - E) = Im E. 
(b) If E^ = E, show that (1 - E)^ = 1 - E. 

2. Let V = Wi © • • • ® Wr and suppose v = Wi + • • • + Wr E V. For each j = 
1, . . . , r we define the operator Ej on V by EjV = Wj. 

(a) Show that Ej G L(V). 

(b) Show that Im Ej =Wj. 

3. Give a completely independent proof of Theorem 7.24 as follows: 

(a) Let T G L(V) be diagonalizable with decomposition T = XiEi + • • • + 
XrEr. Show that f(T) = f(Xi)Ei + • • • + f(K)^r for any f(x) G J[x]. 

(b) Use part (a) to conclude that the minimal polynomial for T must be of 
the form m(x) = (x - • • • (x - Xj-). 

(c) Now suppose T G L(V) has minimal polynomial 

m(x) = (x - Xi) • • • (x - Xr) 
where Xi, . . . , Xr G are distinct. Define the polynomials 

Note that deg pj = r - 1 < deg m. By Exercise 6.4.2, any polynomial f of 
degree < r - 1 can be written as f = 2jf(A,j)pj. Defining Ej = Pj(T), show 
that Ej and that 

1 = El + • • • + Er 

and 

T = XiEi + • • • + XrEr . 



360 



LINEAR TRANSFORMATIONS AND POLYNOMIALS 



(d) Show that m|pipj for i ;t j, and hence show that EjEj = for i ;t j. 

(e) Conclude that T is diagonalizable. 

4. Let El, . . . , Er and Wi, . . . , Wr be as defined in Theorem 7.28, and sup- 
pose T E L(V). 

(a) If TEi = EjT for every Ej, prove that every Wj = Im Ej is T-invariant. 

(b) If every Wj is T-invariant, prove that TEj = EjT for every Ej. [Hint: 
Let v G V be arbitrary. Show that property (a) of Theorem 7.28 implies 
T(EiV) = Wi for some Wi G Wi = Im Ej. Now show that Ej(TEi)v = 
(EiWi)6ij, and hence that Ej(Tv) = T(EjV).] 

5. Prove that property (e) in Theorem 7.29 holds for the matrices Ej given 
prior to the theorem. 

6. Let W be a finite-dimensional subspace of an inner product space V. 

(a) Show that there exists precisely one orthogonal projection on W. 

(b) Let E be the orthogonal projection on W. Show that for any v G V we 
have ||v - Ev\\ < ||v - w|| for every w G W. In other words, show that Ev is 

the unique element of W that is "closest" to v. 



7.9 QUOTffiNT SPACES 

Recall that in Section 1.5 we gave a brief description of normal subgroups and 
quotient groups (see Theorem 1.12). In this section we elaborate on and apply 
this concept to vector spaces, which are themselves abelian groups. In the next 
section we will apply this formalism to proving the triangular form theorem 
for linear operators. 

Let V be a vector space and W a subspace of V. Since W is an abelian 
subgroup of V, it is easy to see that W is just a normal subgroup since for any 
w G W and v G V we have v + w + (-v) = w G W (remember that group 
multiplication in an abelian group is frequently denoted by the usual addition 
sign, and the inverse of an element v is just -v). We may therefore define the 
quotient group V/W whose elements, v + W for any v G V, are just the cosets 
of W in V. It should be obvious that V/W is also an abelian group. In fact, we 
will show below that V/W can easily be made into a vector space. 

Example 7.12 Let V = IR^ and suppose 



W = {(x, y) G IR^: y = mx for some fixed scalar m} 
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In other words, W is just a line through the origin in the plane IR^. The ele- 
ments of VAV are the cosets v + W = {v + w:w£ W} where v is any vector 
in V. 

y 




X 



Therefore, the set VAV consists of all lines in IR^ that are parallel to W (i.e., 
that are displaced from W by the vector v). / 

While we proved in Section 1.5 that cosets partition a group into disjoint 
subsets, let us repeat this proof in a different manner that should help famil- 
iarize us with some of the properties of V/W. We begin with several simple 
properties that are grouped together as a theorem for the sake of reference. 

Theorem 7.30 Let W be a subspace of a vector space V. Then the following 
properties are equivalent: 

(a) uGv + W; 

(b) u-vGW; 

(c) veu + W; 

(d) u + W = V + W. 

Proof (a) => (b): If u G v + W, then there exists w G W such that u = v + w. 
But then u - v = w G W. 

(b) => (c): u - V G W implies v - u = -(u - v) G W, and hence there 
exists w G W such that v - u = w. But then v = u + wGu + W. 

(c) => (d): If V G u + W, then there exists w G W such that v = u + w. But 
then v + W= u + w + W = u + W. 

(d) =>(a): OGWimpliesu = u + OGu + W = v + W. I 
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Theorem 7.31 Let W be a subspace of a vector space V. Then the cosets of 
W in V are distinct, and every element of V lies in some coset of W. 

Proof It is easy to see that any v E V lies in V/W since v = v + OEv + W. 
Now suppose that Vi ^ V2 and that the cosets Vi + W and V2 + W have some 
element u in common. Then u E Vi + W and u E V2 + W, and hence by 
Theorem 7.30 we have Vi + W = u + W = V2 + W. I 

Let V be a vector space over and let W be a subspace of V. We propose 
to make V/W into a vector space. If a E JT^ and u + W, v + W E V/W, we 
define 

(u + W) + (v + W) = (u + v) + W 

and 

a(u + W) = au + W . 

The first thing to do is show that these operations are well-defined. In other 
words, if we suppose that u + W = u' + W and v + W = v' + W, then we must 
show that (u + v) + W = (u' + v') + W and a(u + W) = a(u' + W). Using u + 
W = u' + W and V + W = v' + W, Theorem 7.30 tells us that u - u' E W and 
V - v' E W. But then 

(u + v) - (u' + v') = (u - u') + (v - v') E W 

and hence (u + v) + W = (u' + v') + W. Next, we see that u - u' E W implies 
a(u - u') E W since W is a subspace. Then au - au' E W implies au + W = 
au' + W, or a(u + W) = a(u' + W). 

Theorem 7.32 Let V be a vector space over ^ and W a subspace of V. For 
any u + W, v + W E V/W and a E ^, we define the operations 

(1) (u + W) + (v + W) = (u + v) + W 

(2) a(u + W) = au + W. 

Then, with these operations, V/W becomes a vector space over J^. 

Proof Since (0 + W) + (u + W) = (0 + u) + W = u + W, we see that W is the 
zero element of V/W. Similarly, we see that -(u + W) = -u + W. In view of 
the above discussion, all that remains is to verify axioms (VI) - (V8) for a 
vector space given at the beginning of Section 2.1. We leave this as an 
exercise for the reader (see Exercise 7.9. 1). I 

The vector space V/W defined in this theorem is called the quotient space 
of V by W. If V is finite-dimensional, then any subspace W C V is also finite- 
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dimensional, where in fact dim W < dim V (Theorem 2.9). It is then natural to 
ask about the dimension of VAV. 

Theorem 7.33 Let V be finite-dimensional over ^ and W be a subspace of 
V. Then 

dim VAV = dim V- dim W . 

Proof Suppose {wi, . . . , Wm} is a basis for W. By Theorem 2.10 we can 
extend this to a basis {w,, . . . , Wm Vr} for V, where dim V = m + r = 

dim W + r. Then any v E V may be written as 

V = UiWi + • • • + amWm + PiVj + • • • + PrVf 

where {Ui}, {Pj} E J^. For ease of notation, let us define V = VAV and, for any 
V E V, we let V = V + W E V. Note that this association is linear because (by 
Theorem 7.32) 



v + v' = v + v'+W = v + W + v' + W = v + v' 

and 

= kv + W = k(v + W) = kv . 

Since Wj G W, we see that Wj = Wj + W = W, and hence v = PiVi + • • • + PrVr • 
Alternatively, any v E V may be written as 

V = V + W = SittiWi + 2j|3jVj + W = 2j|3jVj + W = 2j|3jVj 

since SajWj E W. In any case, this shows that the Vi span V. Now suppose 
that 2iYiVi = for some scalars Yi E J^. Then 

YiVi + • • • + YrVr = = W . 

Using Vi = Vi + W, we then see that YiVi + • • • + YrVr + W = W which implies 
that 2iYiVi E W. But {Wi} forms a basis for W, and hence there exist 8i, . . . , 
8ni E ^ such that 

YiVi + • • • + YrVr = 6iWi + • • • + 8mWm • 

However, {wi, . . . , Wm , Vj, . . . , Vj-} is a basis for V and hence is linearly 
independent. This means that Yi = for each i = 1, . . . , r and that 8j = for 
each j = 1, . . . , m. Thus {Vi} is linearly independent and forms a basis for V = 
VAV, and dim VAV = r = dim V - dim W. I 
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There is a slightly different way of looking at this result that will be of use 
to us later. 

Theorem 7.34 Let V be finite-dimensional over ^ and W be a subspace of 
V. Suppose that W has basis Wi, . . . , Wm and V = VAV has basis Vi, . . . , Vr 
where Vj = Vi + W for some Vi £ V. Then {wi, . . . , Wm , Vi, . . . , Vr} is a basis 
forV. 

Proof Let u G V be arbitrary. Then u = u + W E V, and hence there exists 
{tti} E^such that 

U + W = U = ttiVi + • • • + ttrVr = ttiVi + • • • + ttrVr + W . 

By Theorem 7.30 there exists w = (3iWi + • • • + Pm Wm E W such that 

U = UiVi + • • • + arVr + W = UiVi + • • • + a^Wx + PiWi + • • • + Pm Wm . 

This shows that {Wi, . . . , Wm, Vj, . . . , Vi-} spans V. 

To show that these vectors are linearly independent, we suppose that 

YiWi + • • • + YmWm + 6iVi + • • • + 8rVr = . 

Since the association between V and V is linear (see the proof of Theorem 
7.33) and Wi = Wi + W = W, we see that 

8iVi + • • • + 8rVr = = W . 

But the Vi are linearly independent (since they are a basis for V), and hence 
8i = • • • = 6r = 0. (This is just the definition of linear independence if we recall 
that W is the zero vector in V = V/W.) This leaves us with yiWi + • • • + YmWm 
= 0. But again, the Wj are linearly independent, and therefore Yi = • • • = Ym = 0- 
This shows that {wj, . . . , Wm, Vj, . . . , Vj-} is a basis for V, and hence dim V = 
dim W + dim VAV. I 

Exercises 



1 . Finish the proof of Theorem 7.32. 
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2. Let U and V be finite-dimensional, and suppose T is a linear transforma- 
tion of U onto V. If W = Ker T, prove that V is isomorphic to UAV. [Hint: 
See Exercise 1.5.11.] 

3. Let V be a vector space over !f, and let W be a subspace of V. Define a 
relation R on the set V by xRy if x - y £ W. 

(a) Show that R defines an equivalence relation on V. 

(b) Let the equivalence class of x G V be denoted by [x], and define the 
quotient set V/R to be the set of all such equivalence classes. For all x, y G 

V and a G ^ we define addition and scalar multiplication in V/R by 

[x] + [y] = [x + y] 

and 

a[x] = [ax] . 

Show that these operation are well-defined, and that V/R is a vector space 
over ^. 

(c) Now assume that V is finite-dimensional, and define the mapping T: 

V V/R by Tx = [x]. Show that this defines a linear transformation. 

(d) Using Theorem 5.6, prove that dim V/R + dim W = dim V. 

7.10 THE TRLVNGULAR FORM THEOREM * 

Now that we know something about quotient spaces, let us look at the effect 
of a linear transformation on such a space. Unless otherwise noted, we restrict 
our discussion to finite-dimensional vector spaces. In particular, suppose that 
T G L(V) and W is a T-invariant subspace of V. We first show that T induces 
a natural linear transformation on the space V/W. (The reader should be 
careful to note that in Theorem 7.33 is the zero vector in V = V/W, while in 
the theorem below, is the zero transformation on V.) 

Theorem 7.35 Suppose T G L(V) and let W be a T-invariant subspace of V. 
Then T induces a linear transformation T G L(V/W) defined by 

T(v + W) = T(v) + W . 

Furthermore, if T satisfies any polynomial p(x) G ^[x], then so does T. In 
particular, the minimal polynomial in(x) for T divides the minimal polynomial 
m(x) for T. 
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Proof Our first task is to show that T is well-defined and linear. Thus, sup- 
pose V + W = v' + W. Then v - v' e W so that T(v - v') = T(v) - T(v') e W 
since W is T-invariant. Therefore, using Theorem 7.30, we see that 

T(v + W) = T(v) + W = T(v') + W = T(v' + W) 

and hence T is well-defined. To show that T is a linear transformation, we 
simply calculate 

f[(Vi + + (V2 + W)] = f(Vi + V2 + W) = r(Vi + V2 ) + W 

= f{y^+W) + f{v^+W) 

and 

f[ce(v + W)] = f{av + W) = T{av) + W = aT{v) + W 
= a\T{y) + W^ = aT{v + W) . 

This proves that T is indeed a linear transformation. 

Next we observe that for any T El L(V), T^ is a linear transformation and 

W is also invariant under T^. This means that we can calculate the effect of T^ 
on any v + W £ VAV : 

r^(v + W) = T^{v) + W = T{T{v)] + W = f{T{v) + W] 
= f{f{v + W)'\ = f^{v + W) . 

This shows that T^ = T ^, and it is easy to see that in general T"^ = T"^ for 
any m > 0. Then for any p(x) = ao + ajX + • • • + anX° £ ^[x], we have 

W)iy + = p{T){v) + W= ^aj"'(v) + W = ^aJT'"(v) + W] 
= 'Za^T"'(v + W) = 'ZaJ'"'iv + W) = p(f)(v + W) 

sothatp(Ty=P(T). 

Now note that for any v + W G V/W we have 0(v + W) = 0(v) + W = W, 
and hence is the zero transformation on V/W (since W is the zero vector in 
VAV). Therefore, if p(T) = for some p(x) G !F[x], we see that = p(T) = 
p(T ) and hence T satisfies p(x) also. 

Finally, let m(x) be the minimal polynomial for T. If p(x) is such that 
p(T) = 0, then we know that m|p (Theorem 7.4). If m(x) is the minimal poly- 
nomial for T, then m(T) = 0, and therefore m(T) = by what we just proved. 
Hence m|m. I 
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We now come to the main result of this section. By way of motivation, we 
saw in Theorem 7.24 that a linear transformation T G L(V) is diagonalizable if 
and only if the minimal polynomial for T can be written as the product of dis- 
tinct linear factors. What we wish to do now is take into account the more 
general case where all of the distinct factors of the minimal polynomial are 
linear, but possibly with a multiplicity greater than one. We shall see that this 
leads to a triangular form for the matrix representation of T. 

For definiteness we consider upper-triangular matrices, so what we are 
looking for is a basis {Vj} for V in which the action of T takes the form 

r(Vi) = Viflii 

r(V2) = Viai2+V2«22 

We present two versions of this theorem in the present chapter. The first is 
more intuitive, while the second requires the development of some additional 
material that is also of use in other applications. Because of this, we postpone 
the second version until after we have discussed nilpotent transformations in 
the next section. (Actually, Exercises 7.5.7 and 7.5.8 outlined another way to 
prove the first version that used the formalism of ideals and annihilators, and 
made no reference whatsoever to quotient spaces.) Furthermore, in Section 8.1 
we give a completely independent proof for the special case of matrices over 
an algebraically closed field. 

Recall from Theorem 7.9 that an element X E ^ is an eigenvalue of T if 
and only if X, is a root of the characteristic polynomial of T. Thus all the 
eigenvalues of T lie in ^ if and only if the characteristic polynomial factors 
into linear terms. 

Theorem 7.36 (Triangular Form Theorem) Suppose T G L(V) has a char- 
acteristic polynomial that factors into (not necessarily distinct) linear terms. 
Then V has a basis in which the matrix of T is triangular. 

Proof If dim V = 1, then T is represented by a 1 x 1 matrix which is certainly 
triangular. Now suppose that dim V = n > 1. We assume the theorem is true 
for dim V = n - 1 , and proceed by induction to prove the result for dim V = n. 
Since the characteristic polynomial of T factors into linear polynomials, there 
exists at least one nonzero eigenvalue Xi and corresponding eigenvector Vi 
such that T(vi) = XiVi = anVi. Let W be the one-dimensional T-invariant 
subspace spanned by Vi, and define V = VAV so that (by Theorem 7.33) 
dim V = dim V - dim W = n - 1. 
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According to Theorem 7.35, T induces a linear transformation T on V 
such that the minimal polynomial m(x) for T divides the minimal polynomial 
m(x) for T. But this means that any root of m(x) is also a root of m(x). Since 
the characteristic polynomial of T factors into linear polynomials by hypothe- 
sis, so does m(x) (see Theorem 7.12). Therefore m(x) must also factor into 
linear polynomials, and hence so does the characteristic polynomial of T. This 
shows that T and V satisfy the hypotheses of the theorem, and hence there 
exists a basis {vj, ... , Vn} for V = VAV such that 

f(V2) = V2a22 

f(V3) = V2«23+^3«33 

f(v„) = V2«2n+-" + ^An . 

We now let V2, . . . , Vn be elements of V such that Vi = Vj + W for each i = 
2, . . . , n. Since W has basis {vJ, Theorem 7.35 tells us that {vi, V2, . . . , Vn} 
is a basis for V. According to our above result, we have T(v2) = V2a22 which is 
equivalent to T(v2) - V2a22 = 0, and hence the definition of T (see Theorem 
7.35) along with Theorem 7.30 tells us that T(v2) - V2a22 ^ W. Since W is 
spanned by v,, this says there exists a, 2 G 7 such that T(v2) - V2a22 = V]a,2 , 
i.e., T(v2) = v,a,2 + V2a22- Clearly, an identical argument holds for any of the 
T(Vi), and thus for each i = 2, . . . , n there exists a^ G ^ such that T(Vi) - 
V2a2i - • • • - Vjaii G W implies 

T(Vi) = Viaii + V2a2i + • • • + Via;; . 

Written out, this is just 

r(Vi) = Viaii 

7'(V2) = Viai2+^2«22 

In other words, the elements Vi, . . . , Vn G V are a basis for V in which every 
T(Vi) is a linear combination of Vj for j < i. This is precisely the definition of 
triangular form. I 

We now give a restatement of this theorem in terms of matrices. For ease 
of reference, this version is presented as a corollary. 
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Corollary Let A e MnClF) be a matrix whose characteristic polynomial fac- 
tors into linear polynomials. Then A is similar to a triangular matrix. 

Proof The matrix A = (ajj) defines a linear transformation T on the space 

by T(Vi) = 2j"=iVjaji where {Vj} is a basis for In particular, relative to the 
basis {Vj}, the matrix representation of T is precisely the matrix A (since T 
takes the ith basis vector into the ith column of the matrix representing T). 
Since the characteristic polynomial of T is independent of the basis used in the 
matrix representation of T, and the characteristic polynomial of A factors into 
linear polynomials, we see that Theorem 7.36 applies to T. Thus there is a 
basis for in which the matrix of T (i.e., the matrix A) is triangular. By 
Theorem 5. 18 we then see that A must be similar to a triangular matrix. I 

If a linear transformation T can be represented by a triangular matrix, then 
we say that T can be brought into triangular form. Since X, is an eigenvalue 
of T if and only if det(}d - T) = (Theorem 7.9), Theorem 4.5 tells us that the 
eigenvalues of a triangular matrix are precisely the diagonal elements of the 
matrix (this was also discussed in the previous section). 

7.11 NILPOTENT TRANSFORMATIONS * 

An operator T G L(V) is said to be nilpotent if T° = for some positive inte- 
ger n. If T'' = but T^-^ * 0, then k is called the index of nilpotency of T 
(note that T'^"^ * implies that T^ for all j < k - 1). This same terminology 
applies to any square matrix A with the property that A" = 0. Some ele- 
mentary facts about nilpotent transformations are contained in the following 
theorem. Note Theorem 7. 1 implies that if A is the matrix representation of T 
and T is nilpotent with index k, then A is also nilpotent with index k. 

Theorem 7.37 Suppose T e L(V), and assume that for some v £ V we have 
T ''(v) = but T^-\y) * 0. Define the set 

S = {v,T(v),T2(v),...,Tk-i(v)} . 

Then S has the following properties: 

(a) The elements of S are linearly independent. 

(b) The linear span W of S is a T-invariant subspace of V. 

(c) The operator Tw = T|W is nilpotent with index k. 

(d) With respect to the ordered basis {T'^"^(v), . . . , T(v), v} for W, the 
matrix of Tw has the form 
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'0 1 ••• 0' 
1 ••■ 

••• 1 
^0 ••• 0^ 

Thus the matrix of T w has all zero entries except on the superdiagonal where 
they are all equal to one. This shows that the matrix representation of Tw is a 
nilpotent matrix with index k. 

Proof (a) Suppose that 

aov + aiT(v) + • • • + Uk-iT'^-iCv) = 

for some set of scalars Uj £ !f. Applying T''"^ to this equation results in 
aoT''~^(v) = 0. Since T'^"Vv) ^ 0, this implies that = 0. Using this result, 
apply t'^"^ to the above equation to obtain a, = 0. Continuing this procedure, 
we eventually arrive at Qj = for each i = 0, l,...,k-l and hence the ele- 
ments of S are linearly independent. 

(b) Since any w £ W may be written in the form 

w = Pov + PiT(v) + • • • + |3k-iTk-i(v) 

we see that T(w) = poT(v) + • • • + pk-2T^-^(v) G W, and hence T(W) C W. 

(c) Using T\y) = 0, it follows that Tw'^CTXv)) = T'^+Xv) = for each i = 

0, . . . , k - 1. This shows that Tw'^ applied to each element of S (i.e., each of 

the basis vectors for W) is zero, and thus Tw'^ = 0. In addition, since v £ W 

we see that Tw'^'kv) = T'^"^(v) ^ 0, and therefore Tw is nilpotent with index 
k. 

(d) Using Tw(TXv)) = T^"^Vv) along with the fact that the ith column of 
[Tw] is the image of the ith basis vector for W, it is easy to see that [Tw] has 
the desired form. I 

One must be careful to understand exactly what Theorem 7.37 says and 
what it does not say. In particular, if T is nilpotent with index k, then T'^(u) = 
for all u e V, while T^-\y) * for some V £ V. This is because if w = T(v), 
then T'^-i(w) = T'^-IcTCv)) = = 0. Hence it is impossible for T^-\v) to 
be nonzero for all v £ V. 
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It is also interesting to note that according to Theorem 7.37(b), the sub- 
space W is T-invariant, and hence by Theorem 7.19, the matrix of T must be 
of the block form 



where Mj, is the k x k matrix of Tw = T|W defined in part (d) of Theorem 
7.37. If we can find another T-invariant subspace U of V such that V = W © 
U, then the matrix representation of T will be in block diagonal form 
(Theorem 7.20). We now proceed to show that this can in fact be done. Let us 
first prove two more easy results. 

Theorem 7.38 Let T G L(V) be nilpotent, and let S = a,T + • • • + amT™ 
where each ai G J^. Then a^l + S is invertible for any nonzero ao G ^. 

Proof Suppose the index of T is k. Then T'^ = 0, and therefore S'^ = also 
since the lowest power of T in the expansion of S'^ is k. If ^ 0, we leave it 
as a trivial exercise for the reader to show that 

(aol + S)(l/ao - S/ao^ + S^/ao^ + • • • + (-l)''"^ S^'Vao^) = 1 . 

This shows that aol + S is invertible, and that its inverse is given by the above 
polynomial in S. I 

Theorem 7.39 Let T G L(V) be nilpotent with index n, and let W be the T- 
invaiiant subspace spanned by {T°"^(v), . . . , T(v), v} where v G V is such 
that T°-i(v) 0. If w G W is such that T°-''(w) = for some < k < n, then 
there exists Wq G W such that T'^(wo) = w. 

Proof Since w G W, we have 

w = anT"-\v) + • • • + ak+iT V) + WkT^'-^v) + • • • + a2T(v) + aiV 
and therefore (since T° = 0) 




= T° -'^(w) = akT°-i(v) + • • • + aiT°-'^(v) . 



But {T° ^(v), . . . , T" ''(v)} is linearly independent (Theorem 7.37), and thus 
= • • • = tti = 0. This means that 
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w = anT°-i(v) + • • • + ak+iTk(v) = T\wo) 

where Wq = anT°"'*^"^(v) + • • • + Uk+iv e W. ■ 

We are now in a position to prove our above assertion on the decomposi- 
tion of V. This (by no means trivial) result will form the basis of the principle 
theorem dealing with nilpotent transformations (Theorem 7.41 below). It is 
worth pointing out that while the following theorem will be quite useful to us, 
its proof is not very constructive. 

Theorem 7.40 Let T and W be as defined in the previous theorem. Then 
there exists a T-invariant subspace U of V such that V = W © U. 

Proof Let U C V be a T-invariant subspace of largest dimension with the 
property that W fl U = {0}. (Such a space exists since even {0} is T-invariant, 
and W n {0} = {0}.) We first show that V = W + U. If this is not the case, 

then there exists z e V such that z ^ W + U. Since T°(z) = z ^ W + U while 

T°(z) = G W + U, it follows that there must exist an integer k with < k < n 
such that T '^(z) G W + U and Tj(z) ^ W + U for j < k. We write T ''(z) = w + 
u where w G W and u G U, and therefore 

= T"(z) = T"-'^(T'^(z)) = T"- V) + T"- V) • 

Since both W and U are T-invariant, we have T°-''(w) G W and T"-''(u) G U. 
But W n U = {0} so that 

T°-'^(w) = -T°-'^(u) G W n U = . 

(Remember that W and U are subspaces so x G U implies that -x G U also.) 
We now apply Theorem 7.39 to conclude that there exists Wq G W such that 

T''(Wo) = w, and hence T'^(z) = w + u = T'^(wo) + u. Defining x = z - Wq, we 
then have 

T'^(x) = T'^(z) - T'^(wo) = u G U . 

But U is T-invariant, and hence it follows that T™(x) G U for any m > k. 
Considering lower powers of T, let us assume that j < k. Then the T- 

invariance of W implies T-'(wo) G W, while we saw above that T-'(z) ^ W + 
U. This means that 

TJ(x) = Tj(z)-Tj(wo) ^ U 
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(because if Tj(x) e U, then Tj(z) = Tj(wo) + Tj(x) e W + U, a contradiction) 
Now let Ux be that subspace of V spanned by U together with the set 

{T''"^(x), . . . , T(x), x}. Since U is T-invariant and Tj(x) ^ U, it must be true 
that X ^ U. Together with U C Ux, this means dim Ux > dim U. Applying T 
to {T^^-^x), . . . , T(x), x} we obtain the set {T'^(x), T^^-^x), . . . , T\x), T(x)}. 
Since T'^(x) G U and the rest of the vectors in this set are included in the set 
that spans Ux, it follows that Ux is also T-invariant. 

By assumption, U is the subspace of largest dimension that is both T- 
invariant and satisfies W H U = {0}. Since dim Ux > dim U and Ux is T- 
invaiiant, we must have W fl Ux 5^ {0}. Therefore there exists a nonzero 
element in W n Ux of the form Uq + akT''~^(x) + • • • + a2T(x) + aiX where 
Uo G U. We can not have Ui = for every i = 1, . . . , k because this would 
imply that 09iUoGWnU = {0},a contradiction. If we let Ur 9^ be the first 
nonzero Ui , then we have 

u, + (aj'-'^+-- + a^,,T + a^)T'-\x)^W . (*) 

From Theorem 7.38 we see that ayT^'^ + • • • + ar+iT + ar is invertible, and 
its inverse is given by some polynomial p(T). Since W and U are T-invariant, 
they are also invariant under p(T). 
Applying p(T) to (*), we see that 

p(T)(uo)+r-i(x) e p(T)(W) c w . 

This means that T-\x) G W + p(T)(U) C W + U. But r - 1 < r < k, and hence 
this result contradicts the earlier conclusion that TJ(x) ^U for j < k. Since this 
contradiction arose from the assumed existence of an element z G V with 
z ^ W + U, we conclude that V = W + U. Finally, since W n U = {0} by 
hypothesis, we have V = W © U. I 

Combining several previous results, the next major theorem follows quite 
easily. 

Theorem 7.41(a) Let T G L(V) be nilpotent with index of nilpotence n^, and 
let Mk be the k x k matrix containing all O's except for I's on the superdiago- 
nal (see Theorem 7.37). Then there exists a basis for V in which the matrix of 
T has the block diagonal form 
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(M„ ••• \ 

"1 

••• 

,0 - M„^^ 
where iii > • • • > iir and iii + • • • + iir = dim V. 

Proof Since T°i = but T^i-^ 0, there exists v e V such that T°i-i(v) 0. 
Applying Theorem 7.37, we see that the vectors Vi = T"'"\v), . . . , Vm-i = 
T(v), Vn, = V are linearly independent and form the basis for a T-invariant 
subspace Wi of V. Moreover, the matrix of T, = T|Wi in this basis is just Mn,. 

By Theorem 7.40, there exists a T-invariant subspace U C V such that V = 
Wi © U. Define a basis for V by taking the basis {vi, . . . , Vm) for Wi 
together with any basis for U. Then, according to Theorem 7.20, the matrix of 
T with respect to this basis is of the form 

\ aJ 

where A2 is the matrix of T2 = T|U. For any u G U and positive integer m we 
have Tz'^Cu) = T'^Cu). Since T"i = 0, we see that Tz"" = for all m > n, , and 
thus there exists an integer n2 < ni such that Tz"^ = 0. This shows that T2 is 
nilpotent with index nj. 

We now repeat the above argument using T2 and U instead of T and V. 
This time we will decompose A2 into 

\ A,] 

and therefore the representation of T becomes 








0^ 










. 





^3; 



Continuing this process, it should be clear that we will eventually arrive at a 
basis for V in which the matrix of T has the desired form. It is also obvious 
that 2i=ini = dim V = n since the matrix of T must be of size n. I 
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Our next result is a rephrasing of Theorem 7.41(a). 

Theorem 7.41(b) Let T G L(V) be nilpotent with index k. Then there exists 
a basis for V in which the matrix representation of T is block diagonal, and 
where each of these diagonal entries (i.e., square matrices) is of the super- 
diagonal form M given in Theorem 7.37. Moreover, 

(a) There is at least one M matrix of size k, and every other M matrix is of 
size < k. 

(b) The total number of M matrices in the representation of T (i.e., the 
total number of blocks in the representation of T) is just nul T = dim(Ker T). 

Proof See Exercise 7.11.1. I 

Example 7.13 Consider the matrix 
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1 





1^ 
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1 
































.0 














We leave it to the reader to show that A is nilpotent with index 3, i.e., A-^ = 
but A^ ^ 0. We seek the diagonal representation of A described in Theorem 
7.41(b). It is obvious that r(A) = 2, and therefore (using Theorem 5.6) nul A = 
5-2 = 3. Thus there are three M matrices in the diagonal representation of A, 
and one of them must be of size 3. This means that the only possibility for the 
remaining two matrices is that they both be of size 1. Thus the block diagonal 
form for A must be 






1 











1 












\ 



m 

[ m 

It is easy to see that this matrix is also nilpotent with index 3. / 
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Exercises 

1. Prove Theorem 7.41(b). [Hint. What is the rank of the matrix in Theorem 
7.41(a)?] 

2. Suppose S, T £ L(V) are nilpotent with the property that ST = TS. Show 
that S + T and ST are also nilpotent. 

3. Suppose A is a supertriangular matrix, i.e., all entries of A on or below 
the main diagonal are zero. Show that A is nilpotent. 

4. Let Vji be the vector space of all polynomials of degree < n, and let D £ 
L(Vn) be the usual differentiation operator. Show that D is nilpotent with 
index n + 1. 

5. Show that the following nilpotent matrices of size n are similar: 

'0 1 ••• 0\ fO ••• 0' 

1 ••• 1 ••• 

: : : : and 1 ••• . 

••• 1 : : : : 

^0 ••• oj 1^0 ••• 1 0^ 

6. Show that two nilpotent 3x3 matrices are similar if and only if they have 
the same index of nilpotency. Give an example to show that this is not true 
for nilpotent 4x4 matrices. 

7.12 THE TRIANGULAR FORM THEOREM AGAIN * 

After all the discussion on nilpotent transformations in the previous section, 
let us return to our second version of the triangular form theorem which, as we 
shall see in the next chapter, is just the Jordan canonical form. While this 
theorem applies to a finite-dimensional vector space over an arbitrary field, 
the minimal polynomial m(x) for T must be factorable into linear polynomials. 
This means that all the roots of m(x) must lie in ^. Clearly, this will always be 
true if ^ is algebraically closed. 

Theorem 7.42 (Jordan Form) Suppose T G L(V), and assume that the 
minimal polynomial m(x) for T can be written in the form 
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m(x) = (x - • • • (x - Xr)°' 

where each rii is a positive integer, and the are distinct elements of ^. Then 
there exists a basis for V in which the matrix representation A of T has the 
block diagonal form A = A, ® • • • © Ar where each Aj G M^^(!f) for some 
integer kj > nj, and each Aj has the upper triangular form 

% * ••• 0' 
A,. * ••• 

••• A,. * 
^0 ••• A,.^ 

where the *'s may be either or 1. 

Proof For each i = 1, . . . , r we define Wi = Ker(T - Xil)"' and kj = dim Wi. 
By the primary decomposition theorem (Theorem 7.23), we see that V = Wj © 
• • • © Wr (where each Wj is T-invariant), and hence according to Theorem 
2. 15, V has a basis which is just the union of bases of the Wi. Letting the basis 
for V be the union of the bases for Wi, . . . , Wr taken in this order, it follows 
from Theorem 7.20 that A = Ai © • • • © Ar where each Aj is the matrix repre- 
sentation of Ti = T|Wi. We must show that each Aj has the required form, and 
that ki > ni for each i = 1, . . . , r. 

If we define Nj = T - XJ, then Nj G L(Wi) since Wi is T-invariant. In 
other words, Nj is a linear operator defined on the space Wj, and hence so is 

Ni"'. However, since Wi = Ker Ni"', it follows from the definition of kernel 
that Ni"' = so that Ni is nilpotent. The result now follows by applying 
Theorem 7.41(a) to each Ni and writing T = Ni + Xil. I 

Note that each Ai in this theorem is a direct sum of matrices of the form 

% 1 ••• 0' 
X. 1 ••• 

••• A; 1 
^0 ••• A,.^ 
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which are referred to as basic Jordan blocks belonging to X^. This theorem 
will be discussed in much more detail (and from an entirely different point of 
view) in Chapter 8. 

Many of our earlier results follow in a natural manner as corollaries to 
Theorem 7.42. In particular, suppose that V is finite-dimensional over an alge- 
braically closed field J^, and T £ L(V) satisfies the hypotheses of Theorem 
7.42. We wish to know the form of the characteristic polynomial At(x). 
Relative to the basis for V given by Theorem 7.42, we see that the charac- 
teristic matrix xl - A is given by 



t 





* 








t 




X- 


* 










1 







X- Aj 




I 







and hence (using Theorem 4.5) 

r 

At (x) = det(x/ - A) = ]^ (x - A,, f' . 

On the other hand, since m(x) = rR=i (x - X^)"' and kj > ni, properties (a) and 
(b) in the following corollary should be clear, and property (c) follows from 
the proof of Theorem 7.42. Note that property (c) is just the Cayley-Hamilton 
theorem again. 

Corollary Suppose T £ L(V) where V is finite-dimensional over an alge- 
braically closed field ^, and let the minimal and characteristic polynomials of 
T be m(x) and At(x) respectively. Then 

(a) m(x)|AT(x). 

(b) m(x) and At(x) have the same roots (although they are not necessarily 
of the same algebraic multiplicity). 

(c) At(T) = 0. 



Example 7.14 Let V = have basis {vi, V2, V3} and define T e L(V) by 
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r(vi) = -Vj + 2V3 

T(v2) = 3vi + 2v2 +V3 

r(V3) = -V3 . 

Then the matrix representation of T in this basis is 



/-I 



A = 



3 0^ 
2 
2 1 -1 



We first find the minimal polynomial for T. Note that while we have given 
many theorems dealing with the minimal polynomial, there has as yet been no 
general method presented for actually finding it. (We shall see that such a 
method can be based on Theorem 8.8.) Since the minimal polynomial has the 
same irreducible factors as does the characteristic polynomial (Theorem 7.12), 
we begin by finding At(x) = det(xl - A). A simple calculation yields 

At(x) = (x+l)2(x-2) 

and therefore the minimal polynomial must be either 

(x + l)2(x-2) 

or 

(x+l)(x-2) . 



To decide between these two possibilities, we could simply substitute A 
and multiply them out. However, it is worthwhile to instead apply Theorem 
7.23. In other words, we find the subspaces Wi = Ker fi(x)"' and see which 
value of ni (i.e., either 1 or 2) results in V = Wi © W2. We must therefore find 
the kernel (i.e., the null space) of (T + 1), (T + 1) and (T - 2). Applying the 
operator T + 1 to each of the basis vectors yields 

(r + i)(v,) = 2v3 

(T + l)(v2 ) = 3vi + 3v2 + V3 

(r + i)(v3) = o . 



Since Im(T + 1) is spanned by these three vectors, only two of which are obvi- 
ously independent, we see that r(T + 1) = 2. Therefore, applying Theorem 5.6, 
we find that nul(T + 1) = dim V - r(T + 1) = 1. Similarly, we have 
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(r + lf(Vi)=0 

(r + l)2(V2) = 9(Vi+V2+V3) 

(r + i)2(v3) = o 

and so r(T + 1)^ = 1 implies nul(T + 1)^ = 2. It should also be clear that the 
space Ker(T + 1)^ is spanned by the set {vi, V3}. Finally, 

(r-2)(vi) =-3vi + 2v3 
(r-2)(v2) = 3vi+V3 
(r-2)(v3) = -3v3 

so that r(T - 2) = 2 and hence nul(T - 2) = 1. We also note that since 

(T - 2)(vi + V2 + V3) = 

and nul(T - 2) = 1, it follows that the space W2 = Ker(T - 2) must be spanned 
by the vector {vi + V2 + V3}. Alternatively, we could assume that W2 is span- 
ned by some vector u = a^Vi + a2^2 + 0C3V3 and proceed to find ai, a2 and 03 
by requiring that (T - 2)(u) = 0. This results in 

(T - 2)(u) = ai(-3vi + 2v3) + a2(3vi + V3) + a3(-3v3) = 

so that we have the simultaneous set of equations 

-3ai + 3a2 = 
2aj + - 3a3 = . 

This yields ai = a2 = a3 so that u = Vi + V2 + V3 will span W2 as we found by 
inspection. 

From the corollary to Theorem 2.15 we have dim V = dim Wi + dim W2, 
and since dim W2 = nul(T - 2) = 1, it follows that we must have dim Wi = 2, 

and hence Wi = Ker(T + 1)^. Thus the minimal polynomial for T must be 
given by 

m(x) = (x + l)2(x - 2) . 

Note that because of this form for m(x). Theorem 7.24 tells us that T is not 
diagonalizable. 

According to Theorem 7.42, the matrix Ai corresponding to A, = -1 must 
be at least a 2 x 2 matrix, and the matrix A2 corresponding to X = 2 must be at 
least a 1 X 1 matrix. However, since dim V = 3, these are in fact the actual 
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sizes required. While A2 is unambiguous, the matrix Ai could be either a 
single 2x2 matrix, or it could be a direct sum of two 1x1 matrices. To 
resolve this we use Theorem 7.41(b) which tells us that the number of blocks 
in the representation of the nilpotent operator T + 1 is dim(Ker(T + 1)) = 1. 
This means that the Jordan form of T must be 



(-1 1 0\ 
0-10 
^ 2j 



. // 



Exercises 



1. Let V = C and suppose T £ L(V). If T has only one eigenvalue A, of mul- 
tiplicity 4, describe all possible Jordan forms of T. 

2. Let V = and suppose m(x) = (x - X])"' • • • (x - Xr)""^ is the minimal 
polynomial for T G L(V). Let A = Ai ® • • • ® Ar be the Jordan form of T. 
Prove directly from the structure of the Aj that the largest Jordan block 
belonging to A,; has size Uj x Uj. 

3. Let V = C°. If the Jordan form of T G L(V) consists of just one Jordan 
block (counting 1x1 blocks), what is the Jordan form of T^? Explain. 

4. Let V = C", suppose T G L(V), and let X be an eigenvalue of T. What is 
the relationship between the number of Jordan blocks belonging to X and 
the rank of T - XI? Explain. 

5. Let V = C", and suppose that each matrix below represents T G L(V) rela- 
tive to the standard basis. Determine the Jordan form of T. [Hint: Use the 
previous exercise.] 



(a) 



(I 


-1 


-1 


-1^ 





1 


-1 


-1 








2 


2 













(b) 



(I -1 -1 -l^ 
1-1-1 
0020 
0002 



(c) 



( 1 


-1 


-1 


-1\ 


-1 


1 


-1 


-1 








2 


2 


[ 








V 



id) 



M -1 -1 -W 
-1 1 -1 -1 
0020 
0002; 



CHAPTER 8 



Canonical Forms 



Recall that at the beginning of Section 7.5 we stated that a canonical form for 
T £ L(V) is simply a representation in which the matrix takes on an especially 
simple form. For example, if there exists a basis of eigenvectors of T, then the 
matrix representation will be diagonal. In this case, it is then quite trivial to 
assess the various properties of T such as its rank, determinant and eigenval- 
ues. Unfortunately, while this is generally the most desirable form for a matrix 
representation, it is also generally impossible to achieve. 

We now wish to determine certain properties of T that will allow us to 
learn as much as we can about the possible forms its matrix representation can 
take. There are three major canonical forms that we will consider in this 
chapter : triangular, rational and Jordan. (This does not count the Smith form, 
which is really a tool, used to find the rational and Jordan forms.) As we have 
done before, our approach will be to study each of these forms in more than 
one way. By so doing, we shall gain much insight into their meaning, as well 
as learning additional techniques that are of great use in various branches of 
mathematics. 



8.1 ELEMENTARY CANONICAL FORMS 

In order to ease into the subject, this section presents a simple and direct 
method of treating two important results: the triangular form for complex 
matrices and the diagonalization of normal matrices. To begin with, suppose 
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that we have a matrix A e Mn(C). We define the adjoint (or Hermitian 
adjoint) of A to be the matrix At = A*^. In other words, the adjoint of A is its 
complex conjugate transpose. From Theorem 3.18(d), it is easy to see that 

(AB)t = BtAt . 

If it so happens that At = A, then A is said to be a Hermitian matrix. 

If a matrix U E. Mn(C) has the property that Ut = U'\ then we say that U 
is unitary. Thus a matrix U is unitary if UU''' = U^U = 1. (Note that by 
Theorem 3.21, it is only necessary to require either UU^ = I or U^U = I.) We 
also see that the product of two unitary matrices U and V is unitary since 
(UV)tUV = VtUtUV = VtIV = VtV = I. If a matrix N e Mn(C) has the 
property that it commutes with its adjoint, i.e., NN^ = N^N, then N is said to 
be a normal matrix. Note that Hermitian and unitary matrices are auto- 
matically normal. 

Example 8.1 Consider the matrix A G M2(C) given by 




Then the adjoint of A is given by 



t 1 / 1 



V2I-I -i 



and we leave it to the reader to verify that AAt = At A = 1, and hence show 
that A is unitary. / 

We will devote considerable time in Chapter 10 to the study of these 
matrices. However, for our present purposes, we wish to point out one impor- 
tant property of unitary matrices. Note that since U G Mn(C), the rows Ui and 

columns of U are just vectors in C". This means that we can take their 

inner product relative to the standard inner product on (see Example 2.9). 
Writing out the relation UUt = I in terms of components, we have 

iUU% = 2uau\j =%kUjk* = % = {Uj, U,) = 6, 

k=l k=l k=l 

and from UtU = I we see that 

k=l k=l 
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In other words, a matrix is unitary if and only if its rows (or columns) each 
form an orthonormal set. Note we have shown that if the rows (columns) of 
U G Mn(C) form an orthonormal set, then so do the columns (rows), and 
either of these is a sufficient condition for U to be unitary. For example, the 
reader can easily verify that the matrix A in Example 8.1 satisfies these 
conditions. 

It is also worth pointing out that Hermitian and unitary matrices have 
important analogues over the real number system. If A G Mn(IR) is Hermitian, 

then A = A'^ = A^, and we say that A is symmetric. If U G Mn(IR) is unitary, 
then = Ut = U^, and we say that U is orthogonal. Repeating the above 
calculations over R, it is easy to see that a real matrix is orthogonal if and only 
if its rows (or columns) form an orthonormal set. 

It will also be useful to recall from Section 3.6 that if A and B are two 
matrices for which the product AB is defined, then the ith row of AB is given 
by (AB)i = AjB and the ith column of AB is given by (AB)' = AB\ We now 
prove yet another version of the triangular form theorem. 

Theorem 8.1 (Schur Canonical Form) If A G Mn(C), then there exists a 
unitary matrix U G Mn(C) such that U^AU is upper-triangular. Furthermore, 
the diagonal entries of AU are just the eigenvalues of A. 

Proof If n = 1 there is nothing to prove, so we assume that the theorem holds 
for any square matrix of size n - 1 > 1, and suppose A is of size n. Since we 
are dealing with the algebraically closed field C, we know that A has n (not 
necessarily distinct) eigenvalues (see Section 7.3). Let A, be one of these 
eigenvalues, and denote the corresponding eigenvector by By Theorem 
2.10 we extend to a basis for C°, and by the Gram-Schmidt process 
(Theorem 2.21) we assume that this basis is orthonormal. From our discussion 
above, we see that this basis may be used as the columns of a unitary matrix U 
with as its first column. We then see that 



and hence U^AU has the form 
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where B G Mn-i(C) and the *'s are (in general) nonzero scalars. By our induc- 
tion hypothesis, we may choose a unitary matrix W G Mn-i(C) such that 
W^BW is upper-triangular. Let V G Mn(C) be a unitary matrix of the form 



y = 




and define the unitary matrix U = UV G Mn(C). Then 

UtAU = (UV)tA(UV) = Vt(UtAU)V 



is upper-triangular since (in an obvious shorthand notation) 



V\U'^AU)V = 



Va *Yi 0' 
b]\o 




1 \ 



w 



'A * ^ 
BW , 



and W't'BW is upper-triangular by the induction hypothesis. 

Noting that XI - U^AU is upper-triangular, it is easy to see (using 
Theorem 4.5) that the roots of det(XI - U^AU) are just the diagonal entries of 
UtAU. But 



det(M - UtAU) = det[Ut(>J - A)U] = det(M - A) 
so that A and UtAU have the same eigenvalues. I 

Corollary If A G Mn(IR) has all its eigenvalues in IR, then the matrix U 
defined in Theorem 8.1 may be chosen to have all real entries. 

Proof If X G R is an eigenvalue of A, then A - XI is a real matrix with deter- 
minant det(A - XI) = 0, and therefore the homogeneous system of equations 

(A - XI)X = has a real solution. Defining U^ = X, we may now proceed as in 
Theorem 8.1. The details are left to the reader (see Exercise 8.8.1). I 

We say that two matrices A, B G Mn(C) are unitarily similar (written A ~ 
B) if there exists a unitary matrix U such that B = UtAU = U"' AU. Since this 
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defines an equivalence relation on the set of all matrices in Mn(C), many 
authors say that A and B are unitarily equivalent. However, we will be using 
the term "equivalent" in a somewhat more general context later in this 
chapter, and the word "similar" is in accord with our earlier terminology. 

We leave it to the reader to show that if A and B are unitarily similar and 
A is normal, then B is also normal (see Exercise 8.8.2). In particular, suppose 
that U is unitary and N is such that U''"NU = D is diagonal. Since any diagonal 
matrix is automatically normal, it follows that N must be normal also. We 
now show that the converse is also true, i.e., that any normal matrix is 
unitarily similar to a diagonal matrix. 

To see this, suppose N is normal, and let U+NU = D be the Schur canoni- 
cal form of N. Then D is both upper-triangular and normal (since it is unitarily 
similar to a normal matrix). We claim that the only such matrices are 
diagonal. For, consider the (1, 1) elements of DDt and D+D. From what we 
showed above, we have 

(DDt)„ = (Di, Di) = IduP + Idi/ + • • • + IdinP 

and 

(DtD)„ = {D\ Di> = IduP + Id^iP + • • • + IdniP . 

But D is upper-triangular so that d2i = • • • = dm = 0. By normality we must 
have (DD"'')ii = (D^D),,, and therefore di2 = • • • = d,n = also. In other words, 
with the possible exception of the (1, 1) entry, all entries in the first row and 
column of D must be zero. In the same manner, we see that 

(DDt)22 = {D^, D^) = Id^iP + |d22p + • • • + Id^nP 

and 

(DtD)22 = {D2, D2> = |di2p + |d22p + • • • + Idnzl . 

Since the fact that D is upper-triangular means 6^2 = • • • = dn2 = and we just 
showed that d2i = d]2 = 0, it again follows by normality that 623 = ■ ■ ■ = d2n = 
0. Thus all entries in the second row and column with the possible exception 
of the (2, 2) entry must be zero. 

Continuing this procedure, it is clear that D must be diagonal as claimed. 
In other words, an upper-triangular normal matrix is necessarily diagonal. 
This discussion proves the following very important theorem. 

Theorem 8.2 A matrix N E Mn(C) is normal if and only if there exists a uni- 
tary matrix U such that U+NU is diagonal. 

Corollary If A = (aij) E Mn(IR) is symmetric, then there exists an orthogonal 
matrix S such that S^AS is diagonal. 
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Proof If we can show that a real symmetric matrix has all real eigenvalues, 
then this corollary will follow from the corollary to Theorem 8.1 and the real 
analogue of the proof of Theorem 8.2. Now suppose A = so that ajj = ajj. If 
A, is an eigenvalue of A, then there exists a (nonzero and not necessarily real) 
vector X G C" such that Ax = A,x or 

n 

E^y^^='^^' . (1) 

Multiplying (1) by Xj*, summing over i and using the standard inner product 
on C" we obtain 

n 

2 x,*a,;,.x^. =A||xf . (2) 

On the other hand, we may take the complex conjugate of (1), then multiply 
by Xi and sum over i to obtain (since each a^ is real) 

n 

But ay = aji and therefore the left hand side of (3) becomes 

n n n 

!,7=1 i,j=\ i,j=l 

where in the last step we relabelled the index i by j and the index j by i. Since 
this shows that the left hand sides of (2) and (3) are equal, it follows that A, = 
X* as claimed. I 

We will return to this theorem in Chapter 10 where it will be proved in an 
entirely different manner. 

Exercises 

1 . Finish the proof of the corollary to Theorem 8.1. 

2. Show that if A, B G Mn(C) are unitarily similar and A is normal, then B 
is also normal. 

3. Suppose A, B G Mn(C) commute (i.e., AB = BA). 

(a) Prove there exists a unitary matrix U such that U'^'AU and U^BU are 
both upper- triangular. [Hint: Let Vx C C" be the eigenspace of B corre- 
sponding to the eigenvalue X. Show that Vx is invariant under A, and 
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hence show that A and B have a common eigenvector Now proceed 
as in the proof of Theorem 8.1.] 

(b) Show that if A and B are also normal, then there exists a unitary 
matrix U such that U"''AU and U"''BU are diagonal. 

4. Can every matrix A £ Mn(C) be written as a product of two unitary 
matrices? Explain. 

5. (a) Prove that if H is Hermitian, then det H is real. 

(b) Is it the case that every square matrix A can be written as the product 
of finitely many Hermitian matrices? Explain. 

6. A matrix M is skew-Hermitian if Mt = -M. 

(a) Show that skew-Hermitian matrices are normal. 

(b) Show that any square matrix A can be written as a sum of a skew- 
Hermitian matrix and a Hermitian matrix. 

7. Describe all diagonal unitary matrices. Prove that any n x n diagonal 
matrix can be written as a finite sum of unitary diagonal matrices. [Hint: 
Do the cases n = 1 and n = 2 to get the idea.] 

8. Using the previous exercise, show that any n x n normal matrix can be 
written as the sum of finitely many unitary matrices. 

9. If A is unitary, does this imply that det A'^ = 1 for some integer k? What 
if A is a real unitary matrix (i.e., orthogonal)? 

10. (a) Is an n X n matrix A that is similar (but not necessarily unitarily simi- 
lar) to a Hermitian matrix necessarily Hermitian? 

(b) If A is similar to a normal matrix, is A necessarily normal? 

11. If N is normal and Nx = Xx, prove that N''"x = X,*x. [Hint: First treat the 
case where N is diagonal.] 

12. Does the fact that A is similar to a diagonal matrix imply that A is 
normal? 

13. Discuss the following conjecture: If Ni and N2 are normal, then Ni + N2 
is normal if and only if NjNz''" = N2''"Ni. 
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14. (a) If A £ Mn(IR) is nonzero and skew-symmetric, show that A can not 
have any real eigenvalues. 

(b) What can you say about the eigenvalues of such a matrix? 

(c) What can you say about the rank of A? 

15. Let a £ Sn be a permutation, and let f: {1, . . . , n} ^ {+1, -1}. Define 
the signed permutation matrix Po by 



Show that signed permutation matrices are orthogonal. 

16. (a) Prove that a real n x n matrix A that commutes with all n-square real 
orthogonal matrices is a multiple of In. [Hint: Show that the matrices Ejj 
of Section 3.6 can be represented as sums of signed permutation matri- 
ces.] 

(b) What is true for a complex matrix that commutes with all unitary 
matrices? 

8.2 MATRICES OVER THE RING OF POLYNOMIALS 

For the remainder of this chapter we will be discussing matrices with polyno- 
mial entries. Unfortunately, this requires some care since the ring of polyno- 
mials does not form a field (see Theorem 6.2, Corollary 3). However, the 
reader should recall that it is possible to embed ^[x] (or any integral domain 
for that matter) in a field of quotients as we saw in Section 6.5 (see Theorem 
6.16). This simply means that quotients (i.e., rational functions) such as 
f(x)/g(x) are defined (if g ^^^^ 0), along with their inverses g(x)/f(x) (if f ^0). 

First of all, we will generally restrict ourselves to only the real and com- 
plex number fields. In other words, ^ will be taken to mean either IR or C 
unless otherwise stated. Next, we introduce some additional simplifying nota- 
tion. We denote ^[x] (the ring of polynomials) by !P, and the associated field 
of quotients by !^ (think of !P as meaning polynomial and ^as meaning ratio). 
Thus, an m X n matrix with polynomial entries is an element of Minxn(2'). and 
an m x n matrix over the field of quotients is an element of Mm x n(^- Note 
that Mm^nCP) is actually a subset of Mnixn(^ since any polynomial p(x) may 
be written as p(x)/l. 

It is important to realize that since ^ is a field, all of our previous results 
apply to Mnixn(^ just as they do to MmxnCn- However, we need to refor- 
mulate some of our definitions in order to handle Mnixn(^')- In other words, as 





otherwise . 
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long as we allow all operations in ^ there is no problem. Where we must be 
careful is when we restrict ourselves to multiplication by polynomials only 
(rather than by rational functions). To begin with, we must modify the defini- 
tion of elementary row and column operations that we gave in Section 3.2. In 
particular, we now define the iP-elementary row (column) operations as fol- 
lows. The type a operation remains the same, the type (3 operation is multi- 
plication by c £ ^, and the type y operation is now taken to be the addition of 
a polynomial multiple of one row (column) to another. In other words, if Aj is 
the ith row of A £ Mm x then the !P-elementary operations are: 



(a) Interchange Aj and Aj. 
(p) Ai cAi where c e ^. 
(y) Ai ^ Ai + pAj where p > 



With these modifications, it is easy to see that all of our discussion on the 
techniques of reduction to row-echelon form remains valid, although now the 
distinguished elements of the matrix (i.e., the first nonzero entry in each row) 
will in general be polynomials (which we will assume to be monic). In other 
words, the row-echelon form of a matrix A £ Mn(!P) will in general be an 
upper-triangular matrix in M.^{'P) (which may, however, have zeros on the 
main diagonal). However, if A £ Mn(^P) is nonsingular, then r(A) = n, and the 
row-echelon form of A will be upper-triangular with nonzero monic polyno- 
mials down the main diagonal. (This is true since Mn(i') C Mn(^, and hence 
all of our results dealing with the rank remain valid for elements of Mn(fP)). In 
other words, the row-echelon form of A G Mn(^P) will be 



Pn 


Pn 


Pn ■ 


• Pi 





Pii 


P23 ■ 


■ Pi 








P33 ■ 


■ P3 








• 


■ Pn 



where each pij G !P. 



nn I 



Example 8.2 Let us illustrate the basic approach in applying !P-elementary 
operations. For notational simplicity we will consider only the first column of 
a matrix A E M3(fP). Thus, suppose we have 
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V-2jc + 1^ 
x-1 



Multiplying the second row by -x and adding to the third yields 



V-2jc + 1^ 

JC-1 

x + 2 



Adding -1 times the second row to the third and then multiplying the third by 
1/3 now yields 

V-2jc + 1^ 

JC-1 

1 



Adding -(x - 1) times the third row to the second, and -(x - 2x + 1) times 
the third to the first gives us 

(0\ 




Finally, interchanging rows 1 and 3 will put this into row-echelon form. Note 
that while we came up with a field element in this last form, we could have 
ended up with some other nonconstant polynomial. 

We now repeat this procedure on column 2, but only on rows 2 and 3 since 
only these rows have zeros in the first column. This results in a matrix that 
will in general have nonzero elements in row 1 of column 1 , in rows 1 and 2 
of column 2, and in all three rows of column 3. It should now be clear that 
when applied to any A £ MnC^P), this procedure will result in an upper- 
triangular matrix. / 

A moments thought should convince the reader that it will not be possible 
in general to transform a matrix in Mn((P) to reduced row-echelon form if we 
allow only iP-elementary operations. For example, if the row-echelon form of 
A E M2(rP) is 

x^+l 2x-3^ 
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then it is impossible to add any polynomial multiple of the second row to the 
first to eliminate the 2x - 3 term. This is exactly the type of difference that 
occurs between operations in the ring 'F and those in the field %. 

It should also be clear that we can define iP-elementary matrices in the 
obvious way, and that each fP-elementary matrix is also in MnC^P). Moreover, 
each !P-elementary matrix has an inverse which is also in Mn(^P), as is its 
transpose (see Theorem 3.23). In addition. Theorem 4.5 remains valid for 
matrices over !P, as does Theorem 4.4 since replacing row Aj by Aj + pAj 
where p is a polynomial also has no effect on det A. This shows that if we 
reduce a matrix A G MnC^P) to its row-echelon form A, then the fact that A is 
upper-triangular means that 

det A = kdetA 

where k is a unit in 'F (recall from Example 6.4 that the units of the ring F = 
y^[x] are just the elements of ^, i.e., the nonzero constant polynomials). We 
will refer to units of F as (nonzero) scalars. 

We say that a matrix A G Mn(2') is a unit matrix if A"' exists and is also 
an element of Mn(2'). (Do not confuse a unit matrix with the identity matrix.) 
Note this is more restrictive than to say that A G Mn(^P) is merely invertible, 
because we now also require that A"' have entries only in !P, whereas in gen- 
eral it could have entries in ^. From our discussion above, we see that !P- 
elementary matrices are also unit matrices. The main properties of unit 
matrices that we shall need are summarized in the following theorem. 

Theorem 8.3 If A G Mn(^P) and A G Mn(^P) is the row-echelon form of A, 
then 

(a) A is a unit matrix if and only if A can be row-reduced to A = I. 

(b) A is a unit matrix if and only if det A is a nonzero scalar. 

(c) A is a unit matrix if and only if A is a product of !P-elementary matrices. 

Proof (a) If A is a unit matrix, then A"' exists so that r(A) = n (Theorem 
3.21). This means that the row-echelon form of A is an upper- triangular 
matrix A = (pij) G Mn(!P) with n nonzero diagonal entries. Since AA"' = I, it 
follows that (det A)(det A"^) = 1 (Theorem 4.8 is also still valid) and hence 
det A^O. Furthermore, since both det A and det A"' are in fP, Theorem 6.2(b) 
shows us that deg(det A) = deg(det A'') = and thus det A is a scalar. Our 
discussion above showed that 

n 

A: det A = det A = 

1=1 
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where k is a scalar, and therefore each polynomial pii must also be of degree 
zero (i.e., a scalar). In this case we can apply !P-elementary row operations to 
further reduce A to the identity matrix I. 

Conversely, if A is row-equivalent to A = I, then we may write 

El • • • ErA = I 

where each Ei G MnC^P) is an elementary matrix. It follows that A"' exists and 
is given by A"' = Ei • • • Er G Mn((P). Thus A is a unit matrix. 

(b) If A is a unit matrix, then the proof of part (a) showed that det A is a 
nonzero scalar. On the other hand, if det A is a nonzero scalar, then the proof 
of part (a) showed that A = Ei • • • ErA = I, and hence A"' = Ei • • • Er G Mn(rP) 
so that A is a unit matrix. 

(c) If A is a unit matrix, then the proof of part (a) showed that A may be 
written as a product of !P-elementary matrices. Conversely, if A is the product 
of !P-elementary matrices, then we may write A = Er"' • • • Ei"' G Mn(^P). 
Therefore A"' = Ei • • • Er G Mn(fP) also and hence A is a unit matrix. I 

Recall from Section 5.4 that two matrices A, B G MnCT) are said to be 
similar if there exists a nonsingular matrix S G MnCH such that A = S"'BS. In 
order to generalize this, we say that two matrices A, B G M^^ni^) are equiv- 
alent over !P if there exist unit matrices P G Mni(fP) and Q G Mn(!P) such that 
A = PBQ. The reader should have no trouble showing that this defines an 
equivalence relation on the set of all m x n matrices over T. 

Note that since P and Q are unit matrices, they may be written as a product 
of fP-elementary matrices (Theorem 8.3). Now recall from our discussion at 
the end of Section 3.8 that multiplying B from the right by an elementary 
matrix E has the same effect on the columns of B as multiplying from the left 

by E^ does on the rows. We thus conclude that if A and B are equivalent over 
!P, then A is obtainable from B by a sequence of !P-elementary row and col- 
umn operations. Conversely, if A is obtainable from B by a sequence of T- 
elementary row and column operations, the fact that each Ej G Mn(^P) is a unit 
matrix means that A and B are fP-equivalent. 

Theorem 8.4 (a) Two matrices A, B G Mnixn(^P) are equivalent over !P if 
and only if A can be obtained from B by a sequence of !P-elementary row and 
column operations. 

(b) !P-equivalent matrices have the same rank. 

Proof (a) This was proved in the preceding discussion. 
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(b) Suppose A, B e M^xni^) and A = PBQ where P e Mni(fP) and Q e 
Mn(!P) are unit matrices and hence nonsingular. Then, applying the corollary 
to Theorem 3.20, we have 

r(A) = r(PBQ) < min{r(P), r(5g)} = min{m, r(BQ)} = r(5(2) 
^min{r(5),r(!2)} = r(5) . 

Similarly, we see that r(B) = r(P-' AQ"') < r(A) and hence r(A) = r(B). I 

Another point that should be clarified is the following computational tech- 
nicality that we will need to apply several times in the remainder of this chap- 
ter. Referring to Section 6.1, we know that the product of two polynomials 
p(x) = 2f =oaiX^ and q(x) = objxJ is given by 

m+n k 

p(x)q(x) = 2) ^a,x'b,^_,x''~' 

k^O r=0 

where we have been careful to write everything in its original order. In the 
special case that x, ai, bj £ ^ , this may be written in the more common and 
simpler form 

p(x)q(x) = 2) c^^* 

where = ll'l^Qapi^_^ . However, we will need to evaluate the product of two 
polynomials when the coefficients as well as the indeterminate x are matrices. 
In this case, none of the terms in the general form for pq can be assumed to 
commute with each other, and we shall have to be very careful in evaluating 
such products. We do though, have the following useful special case. 

Theorem 8.5 Let p{x) = S^q*^;^' ^'^d q{x) = IT-^^bjX^ be polynomials with 

(matrix) coefficients aj, bj G Ms(^, and let r{x) = 1I^^qC,^x'' where c^ = 

2,toa,&^_, . Then if A E MsC?0 commutes with all of the bj E MsC/0, we have 
p(A)q(A) = r(A). 

Proof We simply compute using Abj = bj A: 

m+n k m+n k 

k=0 t=0 k=0 t=0 

m+n 

k=0 
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What this theorem has shown us is that if A commutes with all of the bj, 
then we may use the simpler form for the product of two polynomials. As an 
interesting application of this result, we now give yet another (very simple) 
proof of the Cayley- Hamilton theorem. Suppose A G Mn(^, and consider 
its characteristic matrix xl - A along with the characteristic polynomial 
Aa(x) = det(xl - A). Writing equation (lb) of Section 4.3 in matrix notation 
we obtain 

[adj(xI-A)](xI-A) = Aa(x)I . 

Now notice that any matrix with polynomial entries may be written as a poly- 
nomial with (constant) matrix coefficients (see the proof of Theorem 7.10). 
Then adj(xl - A) is just a polynomial in x of degree n-l with (constant) 
matrix coefficients, and xl - A is similarly a polynomial in x of degree 1. 
Since A obviously commutes with I and A, we can apply Theorem 8.5 with 
p(x) = adj(xl - A) and q(x) = xl - A to obtain p(A)q(A) = Aa(A). But q(A) = 
0, and hence we find that Aa(A) = 0. 

The last technical point that we wish to address is the possibility of divid- 
ing two polynomials with matrix coefficients. The reason that this is a 
problem is that all of our work in Chapter 6 was based on the assumption that 
we were dealing with polynomials over a field, and the set of all square 
matrices of any fixed size certainly does not in general form a field. Referring 
back to the proof of the division algorithm (Theorem 6.3), we see that the 

process of dividing f(x) = amX™ + • • • + aiX + ao by g(x) = bnx" + • • • + b,x + 
bo depends on the existence of bn~'. This then allows us to show that x - c is a 
factor of f(x) if and only if c is a root of f(x) (Corollary to Theorem 6.4). 

We would like to apply Theorem 6.4 to a special case of polynomials with 
matrix coefficients. Thus, consider the polynomials f(x) = BnX + • • • + BjX + 
Bo and g(x) = xl - A where A, Bj G Mn(i70- In this case, I is obviously invert- 
ible and we may divide g(x) into f(x) in the usual manner. The first two terms 
of the quotient q(x) are then given by 

E„x"-^ + (g„_, + g„A)x"-^ 

xl - A^B^x" + 5„_ix''"^+ ••• +5iX + 5o 

(B„_,+B„A)x"-' 

It should now be clear (using Theorem 8.5) that Theorem 6.4 applies in this 
special case, and if f(A) = 0, then we may write f(x) = q(x)(xl - A). In other 
words, if A is a root of f(x), then xl - A is a factor of f(x). Note that in order 
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to divide f(x) = BnX° + • • • + Bq by g(x) = Amx"^ + • • • + Aq, only the leading 
coefficient Am of g(x) need be invertible. 

Let us also point out that because matrix multiplication is not generally 
commutative, the order in which we multiply the divisor and quotient is 
important when dealing with matrix coefficients. We will adhere to the con- 
vention used in the above example. 

Another point that we should take note of is the following. Two polyno- 
mials p(x) = 2'£=oAkx'^ and q(x) = 2'£=oBkx'^ with coefficients in MnCF) are 
defined to be equal if Ak = Bk for every k = 1, . . . , m. For example, recalling 
that X is just an indeterminate, we consider the polynomial p(x) = Aq + AjX = 
Ao + xA,. If C G Mn(^ does not commute with A, (i.e., CA, A,C), then 
Aq + AjC Aq + CAi. This means that going from an equality such as p(x) = 
q(x) to p(C) = q(C) must be done with care in that the same convention for 
placing the indeterminate be applied to both p(x) and q(x). 



Exercise 



Determine whether or not each of the following matrices is a unit matrix 
by verifying each of the properties listed in Theorem 8.3: 



Jc + 2 
2x + 6 



1 
2 



+ 2x + X + 1 



\ 



-3x^-6jc^ ^ 
-6jc^-18jc^ 
-3x^-6x^-3, 



(&) 



(c) 



x + 1 


x' 


-2 




x'-x' 


x + 7 


'ix^ + 3x 


3 





x^ + 3x + 2 





X 


2x^ + Ax 


x^ 





x + 1 


-x^ 


1 


3x + 6 


-6x^ 


3 



x'-3x'^ 
x-3 
x^ - 3x 



3x^ - 9x 
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If the reader has not studied (or does not remember) the Cauchy-Binet 
theorem (Section 4.6), now is the time to go back and read it. We will need 
this result several times in what follows, as well as the notation defined in that 
section. 

We know that the norm of any integer is just its absolute value, and the 
greatest common divisor of a set of nonzero integers is just the largest positive 
integer that divides them all. Similarly, we define the norm of any polynomial 
to be its degree, and the greatest common divisor (frequently denoted by 
gcd) of a set of nonzero polynomials is the polynomial of highest degree that 
divides all of them. By convention, we will assume that the gcd is monic (i.e., 
the leading coefficient is 1). 

Suppose A G Mnixn(2'), and assume that 1 < k < min{m, n}. If A has at 
least one nonzero k- square subdeterminant, then we define fk to be the 
greatest common divisor of all kth order subdeterminants of A. In other 
words, 

fk = gcd{det A[a||3]: a e INC(k, m), p e INC(k, n)} . 

If there is no nonzero kth order subdeterminant, then we define f^ = 0. 
Furthermore, for notational convenience we define fo = 1. The numbers f^ are 
called the determinantal divisors of A. We will sometimes write fk(A) if 
there is more than one matrix under consideration. 



Example 8.3 Suppose 



A = 



2x^ 1^ 



x + 2 x^ 



x+2 x-l 



Then the sets of nonzero 1-, 2- and 3-square subdeterminants are, respectively. 



{x, 2x2, X + 2, x^, X + 2, X - 1} 



{-x(2x'^ - X - 2), 2x4 - X - 2^ x4 - - x2 - 4x - 4, -x\x + 2), 

-x\x - 1), -x(2x2 + 3x + 1), -(x + 2), -(x - 1)} 



{2x^ + 4x4 _ ^2 _ 4x _ 4} 



and hence fi = 1, f^ = 1 and fs = x^ + 2x4 _ (i/2)x2 - 2x - 2. / 
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Our next result contains two very simple but important properties of deter- 
minantal divisors. Recall that the notation p|q means p divides q. 

Theorem 8.6 (a) If f^ = 0, then fk+i = 0. 
(b) Iffk^O,thenfk|fk+i. 

Proof Using Theorem 6.6, it is easy to see that these are both immediate con- 
sequences of Theorem 4.10 since a (k + \)th order subdeterminant may be 
written as a linear combination of kth order subdeterminants. I 

If A E MmxnCi') has rank r, then Theorem 4.12 tells us that f^^O while 
fr+i = 0. Hence, according to Theorem 8.6(b), we may define the quotients qk 
by 

fk = qkfk-i 

for each k = 1, . . . , r. The polynomials are called the invariant factors of 
A. Note that f o = 1 implies f i = qi, and hence 

fk = qkfk-i = qkqk-ifk-2 = • • • = qkqk-i • • • qi • 

Because each fj, is defined to be monic, it follows that each qi, is also monic. 
Moreover, the unique factorization theorem (Theorem 6.6) shows that each q,, 
(k = 1, . . . , r) can be factored uniquely (except for order) into products of 
powers of prime (i.e., irreducible) polynomials as 

qk = Vi^'Vi^^ ■ • -Ps^^ 

where pi, . . . , ps are all the distinct prime factors of the invariant factors, and 
each Ci is a nonnegative integer. Of course, since every q^ will not necessarily 
contain all of the pi's as factors, some of the Ci's may be zero. 

Each of the factors pi^' for which Ci > is called an elementary divisor of 
A. We count an elementary divisor once for each time that it appears as a fac- 
tor of an invariant factor. This is because a given elementary divisor can 
appear as a factor in more than one invariant factor. Note also that the 
elementary divisors clearly depend on the field under consideration (see 
Example 6.7). However, the elementary divisors of a matrix over C[x] are 
always powers of linear polynomials (Theorem 6.13). As we shall see follow- 
ing Theorem 8.8 below, the list of elementary divisors determines the list of 
invariant factors, and hence the determinantal divisors. 
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Example 8.4 Let Ai be the 3 x 3 matrix 



A = 



( X -I 0\ 
X -1 
-1 1 x-l 



and note that det Ai = (x - l)(x + 1). Now consider the block diagonal matrix 



A = 



.0 A. 



Using Theorem 4.14, we immediately find 



fe = det A = (x - lf(x^ + if 



We now observe that every 5x5 submatrix of A is either block triangular 
with a 3 x 3 matrix on its diagonal that contains one zero row (so the 
determinant is zero), or else is block diagonal with Ai as one of the blocks 
(you should try to write out some of these and see this for yourself). Therefore 



and hence 



fs = (x-l)(x2 + l) 



qe = fe/fs = (X - l)(x2 + 1) 



As to f4, we see that some of the 4x4 subdeterminants contain det Aj while 
others (such as the one obtained by deleting both rows and columns 3 and 4) 
do not contain any factors in common with this. Thus f4 = 1 and we must have 
q 5 = fs. Since fe = qiq2 • • • qe, it follows that q4 = qs = q2 = qi = 1. 

If we regard A as a matrix over IR[x], then the elementary divisors of A are 
X - l,x^ + l,x - l,x^ + 1. However, if we regard A as a matrix over C[x], 
then its elementary divisors are x - 1, x + x - x - 1, x + x - / 

Theorem 8.7 Equivalent matrices have the same determinantal divisors. 

Proof Suppose that A = PBQ. Applying the Cauchy-Binet theorem (the 
corollary to Theorem 4.15), we see that any kth order subdeterminant of A is 
just a sum of multiples of kth order subdeterminants of B. But then the gcd of 
all kth order subdeterminants of B must divide all the kth order subdetermi- 
nants of A. In other words, fk(B)|fk(A). Conversely, writing B = P''AQ"' we 
see that fk(A)|fk(B), and therefore fk(A) = fk(B). I 
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From Example 8.4 (which was based on a relatively simple block diagonal 
matrix), it should be obvious that a brute force approach to finding invariant 
factors leaves much to be desired. The proof of our next theorem is actually 
nothing more than an algorithm for finding the invariant factors of any matrix 
A. The matrix B defined in the theorem is called the Smith canonical (or 
normal) form of A. After the proof, we give an example that should clarify 
the various steps outlined in the algorithm. 

Theorem 8.8 (Smith Canonical Form) Suppose A G Mm x ni'P) has rank r. 
Then A has precisely r + 1 nonzero determinantal divisors fo, fi, . . . , fr, and A 
is equivalent over IP to a unique diagonal matrix B = (b^) E M^xni^) with 
bii = qi = fi/fi-i for i = 1, . . . , r and by = otherwise. Moreover qi|qi+i for 
each i = 1, . . . , r - 1. 

Proof While we have already seen that A has precisely r + 1 nonzero deter- 
minantal divisors, this will also fall out of the proof below. Furthermore, the 
uniqueness of B follows from the fact that equivalence classes are disjoint, 
along with Theorem 8.7 (because determinantal divisors are defined to be 
monic). As to existence, we assume that A ;t or it is already in Smith form. 
Note in the following that all we will do is perform a sequence of !P- 
elementary row and column operations on A. Recall that if E is an elementary 
matrix, then FA represents the same elementary row operation applied to A, 
and AF^ is the same operation applied to the columns of A. Therefore, what 
we will finally arrive at is a matrix of the form B = PAQ where P = Fi, • • • Fi^ 

and Q = Fj,^ • • • Ej^^. Recall also that the norm of a polynomial is defined to 
be its degree. 

Step 1 . Search A for a nonzero entry of least norm and bring it to the (1, 1) 
position by row and column interchanges. By subtracting the appropriate mul- 
tiples of row 1 from rows 2, . . . , m, we obtain a matrix in which every ele- 
ment of column 1 below the (1,1) entry is either or of smaller norm than the 
(1, 1) entry. Now perform the appropriate column operations to make every 
element of row 1 to the right of the (1, 1) entry either or of smaller norm 
than the (1, 1) entry. Denote this new matrix by A. 

Step 2 . Search the first row and column of A for a nonzero entry of least 
norm and bring it to the (1, 1) position. Now repeat the procedure of Step 1 to 
decrease the norm of every element of the first row and column outside the 
(1, 1) position by at least 1. Repeating this step a finite number of times, we 
must eventually arrive at a matrix A; equivalent to A which is everywhere in 
the first row and column outside the (1, 1) position. Let us denote the (1, 1) 
entry of Ai by a. 

Step 3 . Suppose b is the (i, j) element of Ai (where i, j > 1) and a|'b. If no 
such b exists, then go on to Step 4. Put b in the (1, j) position by adding row i 
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to row 1. Since afb, we may write b = aq + r where r ^ and deg r < deg a 
(Theorem 6.3). We place r in the (1, j) position by subtracting q times column 
1 from column j. This results in a matrix with an entry of smaller norm than 
that of a. Now repeat Steps 1 and 2 with this matrix to obtain a new matrix A2 
equivalent to A which is everywhere in the first row and column outside the 
(1,1) position. 

This process is repeated with A2 to obtain A3 and so forth. We thus obtain 

a sequence A,, A2, . . . , As of matrices in which the norms of the (1, 1) entries 
are strictly decreasing, and in which all elements of row 1 and column 1 are 
outside the (1, 1) position. Furthermore, we go on from Ap to obtain Ap+i 
only as long as there is an element of Ap(l|l) that is not divisible by the (1,1) 
element of Ap . Since the norms of the (1, 1) entries are strictly decreasing, 
this process must terminate with a matrix C = (Cij) G Mnixn(2') equivalent to A 
and having the following properties: 

(i) Culcij for every i, j > 1; 

(ii) Cij = for every j = 2, . . . , n; 

(iii) Cii = for every i = 2, . . . , m. 

Step 4 . Now repeat the entire procedure on the matrix C, except that this 
time apply the !P-elementary row and column operations to rows 2, . . . , m and 
columns 2, . . . , n. This will result in a matrix D = (d^) that has all entries in 
the first two rows and columns except for the (1, 1) and (2, 2) entries. Since 
Cnlcij (for i, j > 1), it follows that Cnldy for all i, j. (This true because every 
element of D is just a linear combination of elements of C.) Thus the form of 
D is 

'cii ••• 0' 
d ■■• 

D = 

: : G 

where G = (gy) e M(ni-2)x(n-2)(^'), Cnld and Cnlgij for i = 1, . . . , m - 2 and j = 
1, . . . , n - 2. It is clear that we can continue this process until we eventually 
obtain a diagonal matrix H = (hjj) G Mnixn(^') with the property that hii|hi+i i+i 
and hi+i i+i for i = 1, . . . , p - 1 (where p = rank H). But H is equivalent to 
A so that H and A have the same determinantal divisors (Theorem 8.7) and p 
= r(H) = r(A) = r (Theorem 8.4(b)). 

For each k with 1 < k < r, we observe that the only nonzero k- square sub- 
determinants of H are of the form 11^=1 hj i , and the gcd of all such products 
is 

fk=]\hu 

i=l 
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(since hii|hi+i i+i for i = 1, . . . , r - 1). But then applying the definition of 
invariant factors, we see that 

k k 

YlK=fk=Yl^i ■ 

(=1 /=i 

In particular, this shows us that 

^11 =^1 

and hence h22 = q2, . . . , hj-j- = qj- also. In other words, H is precisely the 
desired matrix B. Finally, note that hii|hi+i i+i is just the statement that 

qilqi+i • ■ 

Suppose A G Mnixn(2') has rank r, and suppose that we are given a list of 
all the elementary divisors of A. From Theorem 8.8, we know that qi|qi+i for 
i=l,...,r-l. Therefore, to compute the invariant factors of A, we first 
multiply together the highest powers of all the distinct primes that appear in 
the list of elementary divisors. This gives us qr. Next, we multiply together the 
highest powers of the remaining distinct primes to obtain qr-i . Continuing 
this process until the list of elementary divisors is exhausted, suppose that qt 
is the last invariant factor so obtained. If k > 1, we then set qi = • • • = qk-i = 1. 
The reader should try this on the list of elementary divisors given at the end of 
Example 8.4. 

Corollary If A, B e M 

mxn(^)» then A is !P-equivalent to B if and only if A 
and B have the same invariant factors (or determinantal divisors or elementary 
divisors). 

Proof Let As and Bs be the Smith forms for A and B. If A and B have the 
same invariant factors then they have the same Smith form. If we denote T- 
equivalence by = , then A=:As = Bs = Bso that A = B. Conversely, if A = B 
then A s: B K Bs implies that A = Bs, and hence the uniqueness of As implies 
that As = Bs , and thus A and B have the same invariant factors. 

If we recall Theorem 6.6, then the statement for elementary divisors 
follows immediately. Now note that fo = 1 so that fi = qi, and in general we 
then have fk = qkfk-i • This takes care of the statement for determinantal 
divisors. I 
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Example 8.5 Consider the matrix A given in Example 7.3. We shall compute 
the invariant factors of the associated characteristic matrix xl - A. The reason 
for using the characteristic matrix will become clear in a later section. 
According to Step 1, we obtain the following sequence of equivalent matrices. 
Start with 



x-2 


-1 





\ 





x-2 














x-2 

















Put -1 in the (1, 1) position: 



( -1 


x-2 








x-2 

















x-2 





V 








X - 



Add x-2 times row 1 to row 2, and x-2 times column 1 to column 2: 

'-\ 

{x-2f 

x-2 
^00 

Since all entries in row 1 and column 1 are except for the (1, 1) entry, this 
last matrix is Ai and we have also finished Step 2. Furthermore, there is no 
element b G Aj that is not divisible by -1, so we go on to Step 4 applied to the 
3x3 matrix in the lower right hand corner. In this case, we first apply Step 1 
and then follow Step 3. We thus obtain the following sequence of matrices. 
Put X - 2 in the (2, 2) position: 

'-\ 
x-2 

{x-2f 
,00 



\ 




x-5, 



\ 




x-5, 
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(x - 2)1^ (x - 5) so add row 4 to row 2: 

'-1 ' 

x-2 x-5 

{x-lf 

^0 x-5^ 

Note x-5 = l(x-2) + (-3), so subtract 1 times column 2 from column 4: 

-1 ' 

x-2 -3 

{x-lf 
^0 

Now put -3 in the (2, 2) position: 



/-I 








\ 





-3 





x-2 








{x-2f 





. 


jc-5 





. 



Add (x - 5)/3 times row 2 to row 4, and then add (x - 2)/3 times column 2 to 
column 4 to obtain 

'-1 ' 

0-3 

(x-2f 
^0 (x-2)(jc-5)/3^ 

Elementary long division (see Example 6.2) shows that (x - 2)(x - 5)/3 
divided by (x - 2) equals 1/3 with a remainder of -x + 2. Following Step 3, 
we add row 4 to row 3 and then subtract 1/3 times column 3 from column 4 to 
obtain 
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' 

0-3 

{x-lf -x + 2 

^0 (x-2)(x-5)/3^ 

Going back to Step 1, we first put -x + 2 = -(x - 2) in the (3, 3) position. We 
then add (x - 5)/3 times row 3 to row 4 and (x - 2) times column 3 to column 
4 resulting in 

/-I ' 

0-3 
-(x-2) 

^0 (x-2)^(jc-5)/3^ 

Lastly, multiplying each row by a suitable nonzero scalar we obtain the final 
(unique) Smith form 

'1 ' 

10 
Jc-2 ■ 

,0 (x-2)2(x-5X 



Exercises 

1. Find the invariant factors of the matrix A given in Example 8.4 by using 
the list of elementary divisors also given in that example. 

2. For each of the following matrices A, find the invariant factors of the char- 
acteristic matrix xl - A: 





1-2, 


3 


-2^ 






( ° 


1 


-1^ 




(a) 


-7 


6 


-3 






-4 


4 


-2 






. 1 


-1 


2 








1 






/ 


2 


-4 


2 


A 







-3 


1 


2\ 



(c) 



-2 

-2 




-2 



1 

3 



id) 



-2 
-2 



1 -1 
1 -1 
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8.4 SIMILARITY INVARIANTS 

Recall that A, B G Mn(^ are similar over ^ if there exists a nonsingular 
matrix S G MnCT) such that A = S"'BS. Note that similar matrices are there- 
fore also equivalent, although the converse is certainly not true (since in gen- 
eral P 9i Q"' in our definition of equivalent matrices). For our present purposes 
however, the following theorem is quite useful. 

Theorem 8.9 Two matrices A, B G Mn(^ are similar over 'J if and only if 
their characteristic matrices xl - A and xl - B are equivalent over 'F = iF[x]. In 

particular, if xl - A = P(xl - B)Q where Q ' = RmX™ + • • • + RiX + Rq, then 

A = S-'BS where S"' = RmB™ + • • • + RiB + Rq. 

Proof If A and B are similar, then there exists a nonsingular matrix S G 
MnCD for which A = S'^BS, and hence 

xI-A = xl- S-'BS = S-'(xI-B)S . 

But S is a unit matrix in Mn(^P), and therefore xl - A and xl - B are fP- 
equivalent. 

On the other hand, if xl - A and xl - B are !P-equivalent, then there exist 
unit matrices P, Q G Mn(!P) such that 

xl - A = P(xl - B)Q . 

We wish to find a matrix S G Mn(^ for which A = S-'BS. Since Q G Mn(lP) 
is a unit matrix, we may apply Theorem 4. 1 1 to find its inverse R G Mn((P) 
which is also a unit matrix and hence will also have polynomial entries. In 
fact, we may write (as in the proof of Theorem 7. 10) 

R = R„x'" + R„_^x"'-^ + --- + RiX + Rq (!) 

where m is the highest degree of any polynomial entry of R and each Rj G 

Mn(^. 

From xl - A = P(xl - B)Q and the fact that P and Q are unit matrices we 
have 

p-\xI-A) = (xI-B)Q = Qx-BQ . (2) 

Now recall Theorem 8.5 and the discussion following its proof. If we write 
both p-' and Q G Mn(2') in the same form as we did in (1) for R, then we may 
replace x by A in the resulting polynomial expression for Q to obtain a matrix 
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W £ MnCn. Since A commutes with I and A, and B £ MnCH^ we may apply 
Theorem 8.5 and replace x by A on both sides of (2), resulting in 

= WA - BW . 

Since R is the inverse of Q and Qx' = x'Q, we have RQ = I or (from (1)) 

RnxQx"^ + Rm-iQx"^-! + • • • + RiQx + RoQ = I . 
Replacing x by A in this expression yields 

m 

^IWA' = I . (3) 

(=0 

But WA = BW so that WA^ = BWA = B^W and, by induction, it follows that 
WA^ = B^W. Using this in (3) we have 



\i=0 I 



SO defining 

m 

5-1= 2)^,5' eMn(if) (4) 
(=0 

we see that S"' = W and hence W = S. Finally, noting that WA = BW 
implies A = W'BW, we arrive at A = S"'BS as desired. I 

Corollary 1 Two matrices A, B G MnCO are similar if and only if their char- 
acteristic matrices have the same invariant factors (or elementary divisors). 

Proof This follows directly from Theorem 8.9 and the corollary to Theorem 
8.8. I 

Corollary 2 If A and B are in Mn(IR), then A and B are similar over C if and 
only if they are similar over R. 

Proof Clearly, if A and B are similar over IR then they are also similar over 
C. On the other hand, suppose that A and B are similar over C. We claim that 
the algorithm in the proof of Theorem 8.9 yields a real S if A and B are real. 
From the definition of S in the proof of Theorem 8.9 (equation (4)), we see 

that S will be real if all of the Ri are real (since each B^ is real by hypothesis), 
and this in turn requires that Q be real (since R = Q '). That P and Q can 
indeed be chosen to be real is left as an exercise for the reader (see Exercise 
8.4.1). I 
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The invariant factors of the characteristic matrix of A are called the 
similarity invariants of A. We will soon show that the similarity invariant of 
highest degree is just the minimal polynomial for A. 



Example 8.6 Let us show that the matrices 



A = 



and 



(0 0\ 
2 



are similar over R. We have the characteristic matrices 



xI-A = 



'x-l -P 
V -1 ^-ly 



/ 



and 



xI-B = 



X 
x-ll 



and hence the determinantal divisors are easily seen to be fi(A) = 1, f2(A) = 
x(x - 2), fi(B) = 1, f2(B) = x(x - 2). Thus fi(A) = fi(B) and f2(A) = ^B) so 
that A and B must be similar by the corollary to Theorem 8.9. 

For the sake of illustration, we will show how to compute the matrix S"' 
following the method used in the proof of Theorem 8.9 (see equation (4)). 
While there is no general method for finding the matrices P and Q, the reader 
can easily verify that if we choose 



P = 



/ 1 2 i\ 

1 X -x+V 



«=2 



-x^ + 3x-l -x^ + 3x-l^ 



\ 



1 



^-1 -X +X + lj 

then xl - A = P(xl - B)Q. It is then easy to show that 



( 



1 



x^ -3x + 3^ 



-1 -X +2>x-\ 




\Q -1 

and hence (from (4)) we have 



\ 


^0 




( 1 








\x + 








.0 


-3j 







1 

-1 



/O 0^ 
2 



'0 -3^ 
.0 3, 



'0 0^ 

.0 V 



' 1 3^ 
-1 -ij 



/ 
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Now recall the definition of minimal polynomial given in Section 7.3 (see 
the discussion following the proof of Theorem 7.10). We also recall that the 
minimal polynomial m(x) for A G Mn(^ divides the characteristic poly- 
nomial Aa(x). In the particular case that m(x) = Aa(x), the matrix A is called 
nonderogatory, and if m(x) ^ Aa(x), then (as you might have guessed) A is 
said to be derogatory. Our next theorem is of fundamental importance. 

Theorem 8.10 The minimal polynomial m(x) for A G MnClT) is equal to its 
similarity invariant of highest degree. 

Proof Since Aa(x) = det(xl - A) is just a (monic) polynomial of degree n in 
X, it is clearly nonzero, and hence qn(x), the similarity invariant of highest 
degree, is also nonzero. Now define the matrix Q(x) = adj(xl - A), and note 
that the entries of Q(x) are precisely all the (n - l)-square subdeterminants of 
xl - A. This means fn-i(x) (i.e., the (n - V)th determinantal divisor of xl - A) 
is just the monic gcd of all the entries of Q(x), and therefore we may write 



where the matrix D(x) has entries that are relatively prime. Noting that by 
definition we have Aa(x) = fn(x) = qn(x)fn-i(x), it follows that 



f„_,ix)Dix){xI - A) = !2(jc)(jc/ - A) = A^(x)I = q„ix)f„_,(x)I (1) 



where we used equation (lb) of Section 4.3. Since fn-i(x) ^ (by Theorem 
8.6(a) and the fact that fn(x) 0), we must have 



(this follows by equating the polynomial entries of the matrices on each side 
of (1) and then using Corollary 2(b) of Theorem 6.2). 

By writing both sides of (2) as polynomials with matrix coefficients and 
then applying Theorem 8.5, it follows that qn(A) = and hence m(x)|qn(x) 
(Theorem 7.4). We may now define the polynomial p(x) by writing 



Q(x) = f„-i(x)D(x) 



D(x)ixI-A) = q„(x)I 



(2) 



q„(x) = m)x)p{x) . 



(3) 



By definition, A is a root of m(x), and therefore our discussion at the end of 
Section 8.2 tells us that we may apply Theorem 6.4 to write 
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m(x)I = C(x)(xl-A) 
where C(x) is a polynomial with matrix coefficients. Using this in (2) we have 
D(x)(xl - A) = q^(x)I = p{x)m{x)I = p{x)C{x){xI - A) (4) 

where we used the fact that m(x) and p(x) are just polynomials with scalar 
coefficients so that m(x)p(x) = p(x)m(x). 

Since det(xl - A)^Q, we know that (xl - A)"' exists over Mn(i^, and thus 
(4) implies that 

D(x) = p(x)C(x) . 

Now regarding both D(x) and C(x) as matrices with polynomial entries, this 
equation shows that p(x) divides each of the entries of D(x). But the entries of 
D(x) are relatively prime, and hence p(x) must be a unit (i.e., a nonzero 
scalar). Since both m(x) and qn(x) are monic by convention, (3) implies that 
p(x) = 1, and therefore qn(x) = m(x). I 

Corollary A matrix A e MnCH is nonderogatory if and only if its first n - 1 
similarity invariants are equal to 1. 

Proof Let A have characteristic polynomial Aa(x) and minimal polynomial 
m(x). Using the definition of invariant factors and Theorem 8.10 we have 

{x) = det(x/ - A) = /„ {x) = (x) • • • {x) 

= m(x)q„_iix)---qi(x) . 

Clearly, if qn-i(x) = • • • = qi(x) = 1 then Aa(x) = m(x). On the other hand, if 
Aa(x) = m(x), then qn-i(x) • • • qi(x) = 1 (Theorem 6.2, Corollary 2(b)) and 
hence each qi(x) (i = 1, . . . , n - 1) is a nonzero scalar (Theorem 6.2, 
Corollary 3). Since each qk(x) is defined to be monic, it follows that qn-i(x) = 
••• = qi(x)=l. I 

Example 8.7 Comparison of Examples 7.3 and 8.8 shows that the minimal 
polynomial of the matrix A is indeed the same as its similarity invariant of 
highest degree. / 
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Exercises 



1. Finish the proof of Corollary 2 to Theorem 8.9. 

2. Show that the minimal polynomial for A E. MnCT) is the least common 
multiple of the elementary divisors of xl - A. 

3. If (x^ - 4)^ is the minimal polynomial of an n- square matrix A, can A^ - 
A"^ + A^ - In ever be zero? If (x^ - 4)^ is the minimal polynomial, can 
A^ - A'^ + A^ - I„ = 0? Explain. 

4. Is the matrix 



/o 





1 


0\ 











1 


1 











.0 


1 





0. 



derogatory or nonderogatory? Explain. 



5. Suppose A is an n- square matrix and p is a polynomial with complex coef- 
ficients. If p(A) = 0, show that p(SAS"') = for any nonsingular n-square 
S. Is this true if p is a polynomial with n-square matrices as coefficients? 

6. Prove or disprove: 

(a) The elementary divisors of A are all linear if and only if the charac- 
teristic polynomial of A is a product of distinct linear factors. 

(b) The elementary divisors of A are all linear if and only if the minimal 
polynomial of A is a product of distinct linear factors. 

7. Prove or disprove: 

(a) There exists a real nonsingular matrix S such that SAS"' = B where 



A= and 5 = 





(b) There exists a complex nonsingular matrix S such that SAS ' = B 
where 



A = 



' 3 0^ 

.-1 2; 



and B = 



'A 
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Given any monic polynomial p(x) = x" - an-ix" ^ 
matrix C(p(x)) G MnOTO defined by 



- ao e ^[x], the 





(0 





• 


• 


«0 ' 




1 





• 


• 




C(p(x)) = 





1 


• 


• 


«2 




.0 





• 


• 1 





is called the companion matrix of the polynomial p(x). If there is no possible 
ambiguity, we will denote the companion matrix simply by C. The companion 
matrix has several interesting properties that we will soon discover. We will 
also make use of the associated characteristic matrix xl - C £ Mn(^ given by 



xI-C = 



X 

-1 X 










-1 X 



-an 



-a. 



-a 



n-2 



-1 x-a„_J 



Our next theorem is quite useful. 



Theorem 8.11 Let p(x) = x° - an-ix°-i 
matrix C. Then det(xl - C) = p(x). 



ao £ ^[x] have companion 



Proof We proceed by induction on the degree of p(x). If n = 1, then p(x) = 
X - ao, C = (ao) and xl - C = (x - ao) so that 

det(xl - C) = X - ao = p(x) . 

Now assume that the theorem is true for all polynomials of degree less than n, 
and suppose deg p(x) = n > 1. If we expand det(xl - C) by minors of the first 
row, we obtain (see Theorem 4. 10) 



det(xl - C) = X det C„ + (-ao)(-l)"+i det C 



in 



where the minor matrices Cn and Cm are given by 
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\ 



X 

-1 X 










-1 X 



-a. 



-a, 



n-2 



-1 x-a„_i 



(-1 







X 

-1 








X 















-1 X 
-1 



Defining the polynomial p'(x) = x°"^ - an-ix°"^ _ . . . _ - along 
with its companion matrix C, we see that Cn = xl - C. By our induction 
hypothesis, it then follows that 

det Cn = det(xl - C) = p'(x) . 

Next we note that Cm is an upper-triangular matrix, and hence (by Theorem 
4.5) det Cin = (-1)° Putting all of this together we find that 



det(xl - C) = xp'(x) - ao = p(x) . I 



Recall that two matrix representations are similar if and only if they repre- 
sent the same underlying operator in two different bases (see Theorem 5.18). 

Theorem 8.12 (a) The companion matrix C = C(p(x)) of any monic 
polynomial p(x) G ^[x] has p(x) as its minimal polynomial m(x). 

(b) If dim V = n and T E L(V) has minimal polynomial m(x) of degree n, 
then C(m(x)) represents T relative to some basis for V. 

Proof (a) From the preceding proof, we see that deleting the first row and 

nth column of xl - C and taking the determinant yields det Cm = (-1)""^ 
Therefore fn-i(x) = 1 so that qi(x) = q2(x) = • • • = qn-i(x) = 1. Hence C is 
nonderogatory (corollary to Theorem 8.10), so that by Theorem 8.11 we have 
m(x) = qn(x) = det(xl - C) = p(x). 

(b) Note dim V = deg At(x) = n = deg m(x) so that any [T] has similarity 
invariants qi(x) = • • • = qn-i(x) = 1 and qn(x) = m(x) (see Theorem 8.10 and 
its corollary). Since the proof of part (a) showed that C = C(m(x)) has the 
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same similarity invariants as [T], it follows from Corollary 1 of Theorem 8.9 
that C and [T] are similar. I 

Note that Theorems 8.11 and 8.12(a) together show that the companion 
matrix is nonderogatory. 

Given any A £ MnClT), we can interpret A as the matrix representation of a 
linear transformation T on an n-dimensional vector space V. If A has minimal 
polynomial m(x) with deg m(x) = n, then so does T (by Theorem 7.1). Hence 
the companion matrix C of m(x) represents T relative to some basis for V 
(Theorem 8.12(b)). This means that A is similar to C (Theorem 5.18), and 
therefore C = P 'AP for some nonsingular transition matrix P G M^Qf). But 
then 

xI-C = xI-P-'AP = p-'(xI-A)P 

and hence det(xl - C) = det(xl - A) by Theorem 4.8 and its corollary. Using 
Theorem 8. 1 1, we then have the following result. 

Theorem 8.13 Let A G MnC!F) have minimal polynomial m(x) of degree n. 
Then m(x) = det(xl - A). 

Our next theorem is a useful restatement of what we have done so far in 
this section. 

Theorem 8.14 Let p(x) = x" - an-ix"-i - • • • - ao E ^[x]. Then the 
companion matrix C(p(x)) is nonderogatory, and its characteristic polynomial 
Ac(x) and minimal polynomial m(x) both equal p(x). Moreover, xl - C is 
equivalent over !P to the n x n matrix (the Smith canonical form of xl - C) 

'1 ••• ' 

••• 1 

^0 ••• p(x)^ 

For notational convenience, we sometimes write a diagonal matrix by 
listing its diagonal entries. For example, the matrix shown in the above theo- 
rem would be written as diag(l, . . . , 1, p(x)). 

Theorem 8.15 If A E MnClT), then A is similar over ^ to the direct sum of 
the companion matrices of its nonunit similarity invariants. 
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Proof The proof is an application of Theorem 8.9. Assume that h ^ c\ 
(where c G ^ or there is nothing to prove. Hence fi(x), the first determinantal 
divisor of xl - A, must be 1 . But f o(x) = 1 by definition, and hence we have 
fi(x) = qi(x)fo(x) = qi(x) = 1. Since at least qi(x) = 1, we now assume that in 
fact the first k similarity invariants of A are equal to 1. In other words, we 
assume that qi(x) = • • • = qk(x) = 1, and then deg qi(x) = dj > 1 for i = k + 1, 
. . . , n. 

Since fn(x) = qi(x) • • • qn(x). Theorem 6.2(b) tells us that deg fn(x) = 
2|=ideg q|(x) and hence (using deg q| = for j = 1, . . . , k) 

n n 

n = degA^(x) = deg/„(x)= J deg^y(x)= J ■ 

j=k+l j=k+\ 

Let Qi = C(qi(x)) G MdX^') for i = k + 1, . . . , n. We want to show that xl - A 
is equivalent over ¥ to 

xl - (Qk+l e • • • e Qn) = (xl - Qk+l) © • • • © (Xl - Qn) . 

(Note that each of the identity matrices in this equation may be of a different 
size.) 

It should be clear that the Smith form of xl - A is the diagonal n x n 
matrix 

(xl - A)s = diag(qi(x), . . . , qn(x)) = diag(l, . . . , 1, qk+i(x), . . . , qn(x)) . 

From Theorem 8.14, we know that (xl - Qi)s = diag(l, . . . , 1, qi(x)) G 
Mdi(!P). Since 2f=k+idi = n, we now see that by suitable row and column 
interchanges we have 

xI-A^(xI-A)s^(xI-Q,,,)s®-®(^I-Qn)s (*) 

where ~ denotes equivalence over !P. 

If we write (xl - Qi)s = Ei(xl - Qi)Fi where Ej and Fi are unit matrices, 
then (by multiplying out the block diagonal matrices) it is easy to see that 

Ek+l(xl - Qk+l)Fk+l © • • • © En(xl - Qn)Fn 

= [Ek+1 © • • • © En][(xl - Qk+l) © • • • © (xl - Qn)][Fk+l © • • • © Fn] . 

Since the direct sum of unit matrices is clearly a unit matrix (so that both 
Ek+i © • • • © En and Fk+i © • • • © Fn are unit matrices), this shows that the 
right hand side of (*) is equivalent to (xl - Qk+i) © • • • © (xl - Qn). (Note we 
have shown that if {Si} and {Tj} are finite collections of matrices such that 
Si ~ Ti, then it follows that Si © • • • © Sn ~ Ti © • • • © Tn.) Therefore xl - A 
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is equivalent to xl - (Qk+i © • • • © Qn) which is what we wanted to show. 
The theorem now follows directly from Theorem 8.9. I 

We are now in a position to prove the rational canonical form theorem. 
Note that the name is derived from the fact that the rational form of a matrix is 
obtained by the application of a finite number of rational operations (which 
essentially constitute the Smith algorithm). 

Theorem 8.16 (Rational Canonical Form) A matrix A G Mn(^ is similar 
over 'y to the direct sum of the companion matrices of the elementary divisors 
of xl - A. 

Proof As in the proof of Theorem 8.15, we assume that the first k similarity 
invariants of A are qi(x) = • • • = qk(x) = 1 and that deg qi(x) = di > 1 for i = 
k + 1 , . . . , n. Changing notation slightly from our first definition, we write 
each nonunit invariant factor as a product of powers of prime polynomials 
(i.e., as a product of elementary divisors): qi(x) = eii(x) • • • einii(x) for each i = 
k + 1, . . . , n. From Theorem 8.14, we know that xl - Qi = xl - C(qi(x)) is !P- 
equivalent to the dj x dj matrix 

Bi = diag(l, . . . , 1, qi(x)) . 

Similarly, if Cjj = deg eij(x), each xl - C(eij(x)) (j = 1, . . . , m^) is fP-equivalent 
to a Cij X Cij matrix 

Dij = diag(l, . . . , 1, eij(x)) . 

Since deg qi(x) = 2j deg eij(x), it follows that the block diagonal matrix 

Di = D„®---®Dimi 

= diag(l, . . . , 1, eii(x)) © diag(l, . . . , 1, ei2(x)) 

©•••©diag(l, 1, eimi(x)) 

is also a di X di matrix. We first show that Bi (and hence also xl - Qi) is (P- 
equivalent to Di. 

Consider the collection of all (di - 1) x (di - 1) subdeterminants of Di. For 
each r = 1, . . . , mi, this collection will contain that subdeterminant obtained 
by deleting the row and column containing Cir. In particular, this subdetermi- 
nant will be Wj^r ^ij ■ But the gcd of all such subdeterminants taken over r (for 
a fixed i of course) is just 1. (To see this, consider the product abed. If we look 
at the collection of products obtained by deleting one of a, b, c or d we obtain 
{bed, acd, abd, abc}. Since there is no factor in common with all four of these 
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products, it follows that the gcd of this collection is 1.) Therefore the (di - 
\)th determinantal divisor fdi-i(x) of Dj is 1, and hence the fact that fk-i(x) 
divides fk(x) means f,(x) = • • • = fd.-](x) = 1 and fdi(x) = njeij(x) = qi(x). 
From the definition of determinantal divisor (or the definition of invariant 
factor along with the fact that Bj is in its Smith canonical form), it is clear that 
Bi has precisely these same determinantal divisors, and hence (by the 
corollary to Theorem 8.8) Bj must be !P-equivalent to Dj. 

All that remains is to put this all together and apply Theorem 8.9 again. 
We now take the direct sum of each side of the equivalence relation xl - Qj = 
Bj = Dii © • • • © Dim; = Dj using the fact that (as we saw in the proof of 
Theorem 8.15) (xl - Qk+i) © • • • © (xl - Qn) = Dk+i © • • • © Dn. It will be 
convenient to denote direct sums by 2© . For example, we have already seen 
it is true in general that 

n n 

2 ®(xi-Qi)=xi- 2 ®Qi 

i=k+l i-k+1 

(where we again remark that the identity matrices in this equation may be of 
different dimensions). Therefore, we have shown that 



xi- 2 ®Qi = 2 ®(^^-Qi> 2 ©(Ai®-®A-™,.) 

i=k+l i=k+l i=k+l 

n ( rrif \ n ( mj \ 

n I nil ^ 

= xl- 2 © 2®<^(^yW) 

i=k+l \7'=1 



and hence r}=k+i©Qi is similar over J to '2"^k^^@[1'jL^®C(eij(x))]. But 

Theorem 8.15 tells us that A is similar over ^ to 2}=k+i© Qi, and therefore we 
have shown that A is similar over ^ to 



2 © 2®C(^y W) 
i=k+l \7'=1 



Example 8.8 Consider the polynomial 



p(x) = (x - l)2(x2 + 1)2 = - 2x5 + _ + _ 2x + 1 
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Its companion matrix is 



C = 



'0 














-T 


1 














2 





1 











-3 








1 








4 











1 





-3 


.0 











1 


2. 



According to Theorem 8.14, C is nonderogatory and its minimal polynomial is 
p(x). Then by Theorem 8.10 and its corollary, the only nonunit similarity 
invariant of C is also p(x). This means that C is already in the form given by 
Theorem 8.15. 

The elementary divisors (in IR[x]) of xl - C are 



and 



ei(x) = (x - 1)2 = x^ - 2x + 1 
e2(x) = (x^ + 1)2 = x^ + 2x2 + 1 
These have the companion matrices 

C(ei(x)) = 




C(e2(x)) = 



^0 








-1^ 


1 














1 





-2 


.0 





1 


0. 



and hence Theorem 8.16 tells us that C is similar over R to the direct sum 
C(ei(x)) © C(e2(x)). We leave it to the reader to find the rational canonical 
form of C if we regard it as a matrix over C[x]. / 



Exercises 



1. Prove Corollary 1 of Theorem 7.24 using the rational canonical form. 
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2. (a) Let V be a real 6-dimensional vector space, and suppose T £ L(V) has 
minimal polynomial m(x) = (x^ - x + 3)(x - 2)^. Write down all possible 
rational canonical forms for T (except for the order of the blocks). 

(b) Let V be a real 7-dimensional vector space, and suppose T G L(V) has 

minimal polynomial m(x) = (x^ + 2)(x + 3)-^. Write down all possible 
rational canonical forms for T (except for the order of the blocks). 

3. Let A be a 4 X 4 matrix with minimal polynomial m(x) = (x^ + l)(x^ - 3). 
Find the rational canonical form if A is a matrix over: 

(a) The rational field Q. 

(b) The real field R. 

(c) The complex field C. 

4. Find the rational canonical form for the Jordan block 



a 


1 





0^ 





a 


1 











a 


1 














5. Find a 3 x 3 matrix A with integral entries such that A-^ + 3A-^ + 2A + 2 = 
0. Prove that your matrix satisfies this identity. 

6. Discuss the validity of each of the following assertions: 

(a) Two square matrices are similar if and only if they have the same 
eigenvalues (including multiplicities). 

(b) Two square matrices are similar if and only if they have the same 
minimal polynomial. 

(c) Two square matrices are similar if and only if they have the same ele- 
mentary divisors. 

(d) Two square matrices are similar if and only if they have the same 
determinantal divisors. 

7. Suppose A = B ® C where B and C are square matrices. Is the list of ele- 
mentary divisors of A equal to the list of elementary divisors of B con- 
catenated with (i.e., "added on to") the list of elementary divisors of C? 
What if "elementary divisors" is replaced by "invariant factors" or "deter- 
minantal divisors" in this statement? 
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8.6 THE JORDAN CANONICAL FORM 

We have defined a canonical form as that matrix representation A of a linear 
transformation T G L(V) that is of a particularly simple form in some basis for 
V. If all the eigenvalues of T lie in the base field J^, then the minimal polyno- 
mial m(x) for T will factor into a product of linear terms. In addition, if the 
eigenvalues are all distinct, then T will be diagonalizable (Theorem 7.24). But 
in the general case of repeated roots, we must (so far) fall back to the triangu- 
lar form described in Chapter 7 and in Section 8.1. However, in this more 
general case there is another very important form that follows easily from 
what we have already done. If A G Mn(C), then (by Theorem 6.13) all the 
elementary divisors of xl - A will be of the simple form (x - a)*^. We shall 
now investigate the "simplest" form that such an A can take. 

To begin with, given a polynomial p(x) = (x - ao)° G ^[x], we define the 
hypercompanion matrix H(p(x)) G MnC!F) to be the upper-triangular matrix 

'uq 1 ••• 
ao ••• 

••• 
^0 ••• 

A matrix of this form is also referred to as a basic Jordan blocli belonging to 
ao. Now consider the characteristic matrix xl - H(p(x)). Note that if we delete 
the nth row and first column of this characteristic matrix, we obtain a lower- 
triangular matrix with all diagonal entries equal to -1, and hence its determi- 
nant is equal to (-1)°"^ Thus the corresponding determinantal divisor fn-i(x) 
is equal to 1, and therefore fi(x) = • • • = fn-i(x) = 1 (because fk-i(x)|fk(x)). 
Using fk(x) = qk(x)fk_i(x), it follows that qi(x) = • • • = qn-i(x) = 1, and thus H 
is nonderogatory (corollary to Theorem 8.10). Since it is obvious that Ah(x) = 

(x - ao)" = p(x), we conclude that Ah(x) = m(x) = p(x). (Alternatively, by 
Theorem 8.10, we see that the minimal polynomial for H is qn(x) = fn(x) = 

(x - a)" which is also just the characteristic polynomial of H.) Along with 
the definition of the Smith canonical form, this proves the following result 
analogous to Theorem 8.14. 

Theorem 8.17 The hypercompanion matrix H(p(x)) of the polynomial p(x) = 
(x - a)° G ^[x] is nonderogatory, and its characteristic and minimal poly- 
nomials both equal p(x). Furthermore, the Smith form of xl - H(p(x)) is 



0' 



«o 1 

Gq. 
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^1 ••• ' 

1 ••• 

••• 1 

••• p(x) 



Theorems 8.14 and 8.17 show that given the polynomial p(x) = (x - a)° £ 
C[x], both C = C(p(x)) and H = H(p(x)) have precisely the same similarity 
invariants. Using Theorem 8.16, we then see that C and H are similar over C. 
Now, if A G Mn(C), we know that the elementary divisors of xl - A will be of 
the form (x - a)'^. Furthermore, Theorem 8.16 shows us that A is similar over 
C to the direct sum of the companion matrices of these elementary divisors. 
But each companion matrix is similar over C to the corresponding hyper- 
companion matrix, and hence A is similar over C to the direct sum of the 
hypercompanion matrices of the elementary divisors of xl - A. 

It may be worth briefly showing that the notions of similarity and direct 
sums may be treated in the manner just claimed. In other words, denoting 
similarity over C by ~ , we suppose that A ~ Ci ® C2 = S"^AS for some non- 
singular matrix S G Mn(C). We now also assume that Ci ~ Hj = Ti"'CiTi for 
each i = 1,2. Then we see that 







Vf'CiT, ' 




H2I 


, T2 ^C{r2f 



(rp-l 

I 



/Ci \(T, 



-2 / 



Co 



To 



which (in an obvious shorthand notation) may be written in the form H = 
T-'CTif we note that 



^T,-' 
T^-'j 



0\ 



-1 



T 



2/ 



We therefore have H = T'^CT = T'^S'^AST = (ST)-'A(ST) which shows that 
A is indeed similar to the direct sum of the hypercompanion matrices. In any 
case, we have proved the difficult part of the next very important theorem (see 
also Theorem 7.42). 
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Theorem 8.18 (Jordan Canonical Form) If A e Mn(C), then A is similar 
over C to the direct sum of the hypercompanion matrices of all the elementary 
divisors of xl - A, and this direct sum is unique except for the order of the 
blocks. Moreover, the numbers appearing on the main diagonal of the Jordan 
form are precisely the eigenvalues of A. (Note that the field C can be replaced 
by an arbitrary field ^ if all the eigenvalues of A lie in ^.) 

Proof Existence was proved in the above discussion, so we now consider 
uniqueness. According to our general prescription, given a matrix A G Mn(C), 
we would go through the following procedure to find its Jordan form. First we 
reduce the characteristic matrix xl - A to its unique Smith form, thus obtain- 
ing the similarity invariants of A. These similarity invariants are then factored 
(over C) to obtain the elementary divisors of xl - A. Finally, the correspond- 
ing hypercompanion matrices are written down, and the Jordan form of A is 
just their direct sum. 

All that remains is to prove the statement about the eigenvalues of A. To 
see this, recall that the eigenvalues of A are the roots of the characteristic 
polynomial det(xl - A). Suppose that J = S"^AS is the Jordan form of A. Then 
the eigenvalues of J are the roots of 

det(xI-J) = det(xl - S-'AS) = det[S-HxI - A)S] = det(xI-A) 

so that A and J have the same eigenvalues. But J is an upper-triangular matrix, 
and hence the roots of det(xl - J) are precisely the diagonal entries of J. I 

Example 8.9 Referring to Example 8.8, we regard C as a matrix over M6(C). 
Then its elementary divisors are ei(x) = (x - 1)-^, e2(x) = (x + i)^ and e3(x) = 
(x - if'. The corresponding hypercompanion matrices are 



H 






and therefore A is similar over C to its Jordan form Hi © © H3. / 
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Our next theorem is really a corollary to Theorem 8.18, but it is a suffi- 
ciently important result that we single it out by itself. 

Theorem 8.19 The geometric multiplicity of an eigenvalue Xj (i.e., dim Yx-) 
belonging to a matrix A G Mn(C) is the number of elementary divisors of the 
characteristic matrix xl - A that correspond to A^. In other words, the number 
of basic Jordan blocks (i.e., hypercompanion matrices) belonging to Xi in the 
Jordan canonical form of A is the same as the geometric multiplicity of X^. 

Proof Suppose that there are n, elementary divisors belonging to Xj, and let 
{Hii, . . . , HinJ be the corresponding hypercompanion matrices. By suitably 
numbering the eigenvalues, we may write the Jordan form of A as 



A = Hue- ••eHi„,e---eHri ©•••©» 



rilr 



where we assume that there are r distinct eigenvalues of A. For definiteness, 
let us arbitrarily consider the eigenvalue Xi and look at the matrix XJ - A. 
Since Xi - Xi 9^ for i ^ 1, this matrix takes the form 

Xil - A = Bii © • • • © Bin, © J21 © • • • © hm © • • • © Jrl © • • • © Jm, 
where each Bu is of the form 



and each Jjj looks like 







-1 
-1 








0^ 


-1 





a -A, 








-1 



Ai-A,. -1 


















\ 




Ai-A, -1 
Ai-A,; 



It should be clear that each Jjj is nonsingular (since they are all equivalent to 
the identity matrix of the appropriate size), and that each Bjj has rank equal to 
one less than its size. Since A is of size n, this means that the rank of XJ - A 
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is n - 111 Gust look at the number of linearly independent rows in - A). But 
from Theorem 5.6 we have 

dim Vxi = dim Ker(XiI - A) = nuKXJ - A) = n - r(XiI - A) = ni . 

In other words, the geometric multiplicity of Xi is equal to the number of 
hypercompanion matrices corresponding to Xi in the Jordan form of A. Since 
Xi could have been any of the eigenvalues, we are finished. I 

Example 8.10 Suppose A E M6(C) has characteristic polynomial 

Aa(x) = (x-2)4(x-3)2 

and minimal polynomial 

m(x) = (x-2)2(x-3)2 . 

Then A has eigenvalue X, = 2 with multiplicity 4, and X2 = 3 with multiplicity 
2, and these must lie along the diagonal of the Jordan canonical form. We 
know that (see the proof of the corollary to Theorem 8.10) 

Aa(x) = m(x)qn-i(x) • • • qi(x) 

where qn(x) = m(x), . . . , qi(x) are the similarity invariants of A, and that the 
elementary divisors of xl - A are the powers of the prime factors of the qi(x). 
What we do not know however, is whether the set of elementary divisors of 
xl - A is {(x - 2)2, (x - 3)2, (x - 2)2} or {(x - if, (x - 3f, x - 2, x - 2}. 

Using Theorem 8.18, we then see that the only possible Jordan canonical 
forms are (up to the order of the blocks) 



2 1 
2 



2 1 
2 



2 1 
2 



or 



\2\ 



3 1 
3 



3 1 
3 



Note that in the first case, the geometric multiplicity of X, = 2 is two, while in 
the second case, the geometric multiplicity of Xi = 2 is three. In both cases, the 
eigenspace corresponding to X2 = 3 is of dimension 1. / 
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Example 8.11 Let us determine all possible Jordan canonical forms for the 
matrix A £ C(3) given by 



(2 


a 


b 





2 


c 


.0 





-1 



The characteristic polynomial for A is easily seen to be 

Aa(x) = (x-2)2(x+l) 



and hence (by Theorem 7.12) the minimal polynomial is either the same as 
Aa(x), or is just (x - 2)(x + 1). If m(x) = A(x), then (using Theorem 8.18 
again) the Jordan form must be 



while in the second case, it must be 



\ 







If A is to be diagonalizable, then (either by Theorem 7.26 or the fact that the 
Jordan form in the second case is already diagonal) we must have the second 
case, and hence 

3a --^ 




= m(A) = (A-2/)(A + /) = 











ac 





so that A will be diagonalizable if and only if a = 0. / 

As another application of Theorem 8.16 we have the following useful 
result. Note that here the field ^ can be either IR or C, and need not be alge- 
braically closed in general. 
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Theorem 8.20 Suppose B; e MnjCF) for i = 1, . . . , r and let A = Bi © • • • © 

Bj- G MnCT) (so that n = 2i=ini). Then the set of elementary divisors of xl - A 
is the totality of elementary divisors of all the xl - Bj taken together. 

Proof We prove the theorem for the special case of A = Bi ® B2. The general 
case follows by an obvious induction argument. Let S = {ei(x), . . . , eni(x)} 
denote the totality of elementary divisors of xl - Bi and xl - B2 taken 
together. Thus, the elements of S are powers of prime polynomials. Following 
the method discussed at the end of Theorem 8.8, we multiply together the 
highest powers of all the distinct primes that appear in S to obtain a polyno- 
mial which we denote by qn(x). Deleting from S those ei(x) that we just used, 
we now multiply together the highest powers of all the remaining distinct 
primes to obtain qn-i(x). We continue this procedure until all the elements of 
S are exhausted, thereby obtaining the polynomials qk+i(x), . . . , qn(x). Note 
that our construction guarantees that qj(x)|qj+i(x) for j = k + 1, . . . , n - 1. 
Since fn(x) = qi(x) • • • qn(x), it should also be clear that 

n 

2 deg^^(x) = ni • 

i=k+l 

Denote the companion matrix C(q|(x)) by simply Cj, and define the matrix 
Q = Ck+i ©•••©€„ E MnCO • 

Then 

xl - Q = (xl - Ck+l) © • • • © (xl - Cn) . 

But according to Theorem 8.14, xl - Cj ~ diag(l, . . . , 1, qj(x)), and hence 

xl - Q = diag(l, . . . , 1, qk+i(x)) © • • • © diag(l, . . . , 1, qn(x)) . 

Then (since the Smith form is unique) the nonunit similarity invariants of Q 
are just the qj(x) (for j = k + 1, . . . , n), and hence (by definition of elementary 
divisor) the elementary divisors of xl - Q are exactly the polynomials in S. 
Then by Theorem 8.16, Q is similar to the direct sum of the companion matri- 
ces of all the polynomials in S. 

On the other hand. Theorem 8.16 also tells us that Bi and B2 are each simi- 
lar to the direct sum of the companion matrices of the elementary divisors of 
xl - Bi and xl - B2 respectively. Therefore B, ® B2 = A is similar to the direct 
sum of the companion matrices of all the polynomials in S. We now see that A 
is similar to Q, and hence (by Theorem 8.9, Corollary 1) xl - A and xl - Q 
have the same elementary divisors. Since the elementary divisors of xl - Q are 
just the polynomials in S, and S was defined to be the totality of elementary 
divisors of xl - Bi and xl - B2, the proof is complete. I 
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The notion of "uniqueness" in Theorem 8.18 is an assertion that the Jordan 
form is "uniquely defined" or "well-defined." Suppose A E Mn(C) has Jordan 
form Hi © • • • © Hp where each Hj is a basic Jordan block, and suppose that 
Gi © • • • © Gq is any other matrix similar to A which is a direct sum of basic 
Jordan blocks. Then it follows from Theorem 8.20 that the Gj must, except for 
order, be exactly the same as the Hi (see Exercise 8.6.4). We state this in the 
following corollary to Theorem 8.20. 

Corollary (Uniqueness of the Jordan form) Suppose A G Mn(C), and let 
both G = Gi © • • • © Gp and H = Hi © • • • © Hq be similar to A, where each 
Gi and Hi is a basic Jordan block. Then p = q and, except for order, the Gi are 
the same as the Hi. 

We saw in Section 7.5 that if a vector space V is the direct sum of T- 
invariant subspaces Wi (where T E L(V)), then the matrix representation A of 
T is the direct sum of the matrix representations of Ti = T|Wi (Theorem 7.20). 
Another common way of describing this decomposition of A is the following. 
We say that a matrix is reducible over ^ if it is similar to a block diagonal 
matrix with more than one block. In other words, A G Mn(j70 is reducible if 
there exists a nonsingular matrix S G Mn(^ and matrices B G Mp(^ and C G 
Mq(J) with p + q = n such that S"'AS = B ® C. If A is not reducible, then we 
say that A is irreducible. A fundamental result is the following. 

Theorem 8.21 A matrix A G MnCiF) is irreducible over J if and only if A is 
nonderogatory and the characteristic polynomial Aa(x) is a power of a prime 
polynomial. Alternatively, A is irreducible if and only if xl - A has only one 
elementary divisor. 

Proof If A is irreducible, then xl - A can have only one elementary divisor 
(which is then necessarily a prime to some power) because (by Theorem 8.16) 
A is similar to the direct sum of the companion matrices of all the elementary 
divisors of xl - A. But these elementary divisors are the factors of the similar- 
ity invariants qk(x) where qk(x)|qk+i(x), and therefore it follows that 

qi(x) = • • • = qn-i(x) = 1 . 

Hence A is nonderogatory (corollary to Theorem 8.10). 
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Now assume that A is nonderogatory and that Aa(x) is a power of a prime 
polynomial. From Theorem 8.10 and its corollary we know that qi(x) = • • • = 
qn_](x) = 1, and hence qn(x) = m(x) = Aa(x) is now the only elementary 
divisor of xl - A. If A were reducible, then (in the above notation) it would be 
similar over F to a matrix of the form B © C = S"' AS, and by Corollary 1 of 
Theorem 8.9, it would then follow that xl - A has the same elementary divi- 
sors as xl - (B © C) = (xl - B) © (xl - C). Note that by the corollary to 
Theorem 8.8, xl - A and S"'(xl - A)S = xl - S"^AS have the same elementary 
divisors. But xl - B and xl - C necessarily have at least one elementary 
divisor each (since their characteristic polynomials are nonzero), and (by 
Theorem 8.20) the elementary divisors of xl - S 'AS are the totality of the 
elementary divisors of xl - B plus those of xl - C. This contradicts the fact 
that xl - A has only one elementary divisor, and therefore A must be irre- 
ducible. I 

For example, we see from Theorem 8.17 that the hypercompanion matrix 
H((x - a)"^) is always irreducible. One consequence of this is that the Jordan 
canonical form of a matrix is the "simplest" in the sense that there is no simi- 
larity transformation that will further reduce any of the blocks on the diagonal. 
Similarly, since any elementary divisor is a power of a prime polynomial, we 
see from Theorem 8.14 that the companion matrix of an elementary divisor is 
always irreducible. Thus the rational canonical form can not be further 
reduced either. Note that the rational canonical form of a matrix A G Mn(C) 
will have the same "shape" as the Jordan form of A. In other words, both 
forms will consist of the same number of blocks of the same size on the 
diagonal. 

In Sections 7.2 and 7.7 we proved several theorems that showed some of 
the relationships between eigenvalues and diagonalizability. Let us now relate 
what we have covered in this chapter to the question of diagonalizability. It is 
easiest to do this in the form of two simple theorems. The reader should note 
that the companion matrix of a linear polynomial x - ao is just the 1 x 1 matrix 
(ao). 

Theorem 8.22 A matrix A G Mn(^ is similar over ^ to a diagonal matrix 
D G MnClTO if and only if all the elementary divisors of xl - A are linear. 

Proof If the elementary divisors of xl - A are linear, then each of the corre- 
sponding companion matrices consists of a single scalar, and hence the 
rational canonical form of A will be diagonal (Theorem 8.16). Conversely, if 
A is similar to a diagonal matrix D, then xl - A and xl - D will have the same 
elementary divisors (Theorem 8.9, Corollary 1). Writing D = D, ® • • • ® Dn 
where Dj = (di) is just a 1 x 1 matrix, we see from Theorem 8.20 that the ele- 
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mentary divisors of xl - D are the linear polynomials {x - di, . . 
(since the elementary divisor of xl - Dj = (x - di) is just x - di). I 



, X - dn} 



Theorem 8.23 A matrix A G Mn(^ is similar over ^ to a diagonal matrix 
D G Mn OF) if and only if the minimal polynomial for A has distinct linear 
factors in!P = ^[x]. 

Proof Recall that the elementary divisors of a matrix in Mn(^P) are the 
powers of prime polynomials that factor the invariant factors qk(x), and 
furthermore, that qk(x)|qk+i(x). Then all the elementary divisors of such a 
matrix will be linear if and only if its invariant factor of highest degree has 
distinct linear factors in (P. But by Theorem 8. 10, the minimal polynomial for 
A G MnClTO is just its similarity invariant of highest degree (i.e., the invariant 
factor of highest degree of xl - A G Mn(!P)). Then applying Theorem 8.22, we 
see that A will be diagonalizable if and only if the minimal polynomial for A 
has distinct linear factors in !P. I 

While it is certainly not true that any A G Mn(C) is similar to a diagonal 
matrix, it is an interesting fact that A is similar to a matrix in which the off- 
diagonal entries are arbitrarily small. To see this, we first put A into its Jordan 
canonical form J. In other words, we have 



On 712 

722 723 



J = S'^AS = 


















^ 




Jn-l n-l Jn-l n 
Jnn I 



If we now define the matrix T = diag(l, 8, 8^, 
the reader to show that 



, 8" ^), then we leave it to 



T-^JT = {STr^A{ST) 

'jn ^hi 
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By choosing 8 as small as desired, we obtain the form claimed. 
Exercises 

1. If all the eigenvalues of A £ MnCT) lie in J^, show that the Jordan canoni- 
cal form of A has the same "block structure" as its rational canonical 
form. 

2. Prove Theorem 7.25 using the Jordan canonical form (Theorem 8. 18). 

3. Prove Theorem 7.26 using the Jordan canonical form. 

4. Finish proving the corollary to Theorem 8.20. 

5. State and prove a corollary to Theorem 8.16 that is the analogue of the 
corollary to Theorem 8.20. 

6. (a) Suppose a matrix A has characteristic polynomial 

Aa(x) = (x-2)4(x-3)3 
and minimal polynomial 

m(x) = (x - 2)2(x - 3)2 . 
What are the possible Jordan forms for A? 

(b) Suppose A has characteristic polynomial Aa(x) = (x - 2)^(x - Sf'. 
What are the possible Jordan forms? 

7. Find all possible Jordan forms for those matrices with characteristic and 
minimal polynomials given by: 

(a) A(x) = (x - 2)4(x - 3)2 and m(x) = (x - - 3>f. 

(b) A(x) = (x - 7)^ and m(x) = (x - if. 

(c) A(x) = (x - if and m(x) = (x - if. 

(d) A(x) = (x - 3)'^(x - 5)"^ and m(x) = (x - 3,f{x - 5f. 

8. Show that every complex matrix is similar to its transpose. 

9. Is it true that all complex matrices A G Mn(C) with the property that A" = 
I but a'' 9i I for k < n are similar? Explain. 
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10. (a) Is it true that an eigenvalue A, of a matrix A £ Mn(C) has multiplicity 
1 if and only if A,l - A has rank n - 1? 

(b) Suppose an eigenvalue X of A G Mn(C) is such that r(Xl - A) = n - 1. 
Prove that either k has multiplicity 1, or else r(A,I - A)^ = n - 2. 

11. Suppose A G Mn(C) is idempotent, i.e., A^ = A. What is the Jordan form 
of A? 

12. Suppose A G Mn(C) is such that p(A) = where 

p(x) = (x-2)(x-3)(x-4). 

Prove or disprove the following statements: 

(a) The minimal polynomial for A must be of degree 3. 

(b) A must be of size n < 3. 

(c) If n > 3, then the characteristic polynomial of A must have multiple 
roots. 

(d) A is nonsingular. 

(e) A must have 2, 3 and 4 as eigenvalues. 

(f) If n = 3, then the minimal and characteristic polynomials of A must 

be the same. 

(g) If n = 3 then, up to similarity, there are exactly 10 different choices 
for A. 

13. Recall that A G Mn(C) is said to be nilpotent of index k if k is the 

smallest integer such that A'^ = 0. 

(a) Describe the Jordan form of A. 

Prove or disprove each of the following statements about A G Mn(C): 

(b) A is nilpotent if and only if every eigenvalue of A is zero. 

(c) If A is nilpotent, then r(A) - r(A^) is the number of elementary divi- 
sors of A. 

(d) If A is nilpotent, then r(A) - r(A ) is the number of p x p Jordan 
blocks of A with p > 1. 

(e) If A is nilpotent, then the nul(A) is the number of Jordan blocks of A 
(counting 1x1 blocks). 

(f) If A is nilpotent, then nuKA'^ '^^) - nuKA'^) is the number of Jordan 
blocks of size greater than k. 
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14. Suppose A £ Mn(C) has eigenvalue K of multiplicity m. Prove that the 
elementary divisors of A corresponding to A, are all linear if and only if 
r(XI - A) = r((M - A)^). 

15. Prove or disprove the following statements about matrices A, B G Mn(C): 

(a) If either A or B is nonsingular, then AB and BA have the same mini- 
mal polynomial. 

(b) If both A and B are singular and AB BA, then AB and BA are not 
similar. 

16. Suppose A G Mn(C), and let adj A be as in Theorem 4.1 1. If A is nonsin- 
gular, then (SAS"')-' = SA-'S"' implies that adjXSAS"') = S(adj A)S-' by 
Theorem 4.11. By using "continuity" arguments, it is easy to show that 
this identity is true even if A is singular. Using this fact and the Jordan 

form, prove: 

(a) If det A = but Tr(adj A) ^ 0, then is an eigenvalue of A with 
multiplicity 1. 

(b) If det A = but Tr(adj A) ^ 0, then r(A) = n - 1. 
8.7 CYCLIC SUBSPACES * 

It is important to realize that the Jordan form can only be found in cases where 
the minimal polynomial is factorable into linear polynomials (for example, if 
the base field is algebraically closed). On the other hand, the rational canoni- 
cal form is valid over non-algebraically closed fields. In order to properly 
present another way of looking at the rational canonical form, we first intro- 
duce cyclic subspaces. Again, we are seeking a criterion for deciding when 
two matrices are similar. The clue that we now follow up on was given earlier 
in Theorem 7.37. 

Let V ;t be a finite-dimensional vector space over an arbitrary field F , 
and suppose T G L(V). We say that a nonzero T-invariant subspace Z of V is 
T-cyclic if there exists a nonzero v G Z and a positive integer k > such that 

Z is spanned by the set {v, T(v), . . . , T'^(v)}. An equivalent way of defining 
T-cyclic subspaces is given in the following theorem. 

Theorem 8.24 Let V be finite-dimensional and suppose T G L(V). A 
subspace Z C V is T-cyclic if and only if there exists a nonzero v G Z such 
that every vector in Z can be expressed in the form f(T)(v) for some f(x) G 
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Proof If Z is T-cyclic, then by definition, any u £ Z may be written in terms 
of the set {v, T(v), . . . , T''(v)} as 

u = aov + aiT(v) + • • • + = (ao + ajT + • • • + a^T^Xv) = f(T)(v) 

where f(x) = ao + a,x + • • • + a^x^^ G ^[x]. On the other hand, if every u G Z is 

of the form f(T)(v), then Z must be spanned by the set {v, T(v), T (v), . . . }. 
But Z is finite-dimensional (since it is a subset of the finite-dimensional space 
V), and hence there must exist a positive integer k such that Z is spanned by 
the set {v, T(v), . . . , I 

Generalizing these definitions slightly, let v G V be nonzero. Then the set 
of all vectors of the form f(T)(v) where f(x) varies over all polynomials in 
is a T-invariant subspace called the T-cyclic subspace of V generated 
by V. We denote this subspace by Z(v, T). We also denote the restriction of T 
to Z(v, T) by Ty = T|Z(v, T). That Z(v, T) is a subspace is easily seen since for 
any f , g G ^[x] and a, b G ^ we have 

af(T)(v) + bg(T)(v) = [af(T)+bg(T)](v) = h(T)(v) G Z(v,T) 

where h(x) = af(x) + bg(x) G ^[x] (by Theorem 7.2). It should be clear that 
Z(v, T) is T-invariant since any element of Z(v, T) is of the form f(T)(v), and 
hence 

T[f(T)(v)] = [Tf(T)](v) = g(T)(v) 

where g(x) = x f(x) G ^[x]. In addition, Z(v, T) is T-cyclic by Theorem 8.24. 
In the particular case that Z(v, T) = V, then v is called a cyclic vector for T. 

Let us briefly refer to Section 7.4 where we proved the existence of a 
unique monic polynomial mv(x) of least degree such that mv(T)(v) = 0. This 
polynomial was called the minimal polynomial of the vector v. The existence 
of mv(x) was based on the fact that V was of dimension n, and hence for any 

V G V, the n + 1 vectors {v, T(v), . . . , T°(v)} must be linearly dependent. 
This showed that deg mv(x) < n. Since mv(x) generates the ideal Nt(v), it 
follows that mv(x)|f(x) for any f(x) G Nt(v), i.e., where f(x) is such that 
f(T)(v) = 0. Let us now show how this approach can be reformulated in terms 
of T-cyclic subspaces. 

Using Theorem 8.24, we see that for any nonzero v G V we may define 
Z(v, T) as that finite-dimensional T-invariant subspace of V spanned by the 
linearly independent set {v, T(v), . . . , T''"^(v)}, where the integer d > 1 is de- 
fined as the smallest integer such that the set {v, T(v), . . . , T'^(v)} is linearly 
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dependent. This means that T'^(v) must be a linear combination of the vectors 
V, T(v), ... , T''"^(v), and hence is of the form 

T\w) = aov + • • • + ad-iTd-i(v) 

for some set of scalars {ai}. Defining the polynomial 

mv(x) = x'' - ad-ix''"^ ao 

we see that mv(T)(v) = 0, where deg mv(x) = d. All that really remains is to 
show that if f(x) e J[x] is such that f(T)(v) = 0, then mv(x)|f(x). This will 
prove that mv(x) is the polynomial of least degree with the property that 

mv(T)(v) = 0. 

From the division algorithm, there exists g(x) £ ^[x] such that 

f(x) = mv(x)g(x) + r(x) 

where either r(x) = or deg r(x) < deg mv(x). Substituting T and applying this 
to V we have (using mv(T)(v) = 0) 

= f(T)(v) = g(T)mv(T)(v)+r(T)(v) = r(T)(v) . 

But if r(x) ^ with deg r(x) < deg mv(x), then (since Z(v, T) is T-invariant) 

r(T)(v) is a linear combination of elements in the set {v, T(v), . . . , T'^'^v)}, 
and hence the equation r(T)(v) = contradicts the assumed linear inde- 
pendence of this set. Therefore we must have r(x) = 0, and hence mv(x)|f(x). 

Lastly, we note that mv(x) is in fact the unique monic polynomial of least 
degree such that mv(T)(v) = 0. Indeed, if m'(x) is also of least degree such that 
m'(T)(v) = 0, then the fact that deg m'(x) = deg mv(x) together with the result 
of the previous paragraph tells us that mv(x)|m'(x). Thus m'(x) = amv(x) for 
some a G F, and choosing a = 1 shows that mv(x) is the unique monic poly- 
nomial of least degree such that mv(T)(v) = 0. 

We summarize this discussion in the following theorem. 

Theorem 8.25 Let v G V be nonzero and suppose T G L(V). Then there 
exists a unique monic polynomial mv(x) of least degree such that mv(T)(v) = 
0. Moreover, for any polynomial f(x) G ^[x] with f(T)(v) = we have 
mv(x)|f(x). 

Corollary If m(x) is the minimal polynomial for T on V, then mv(x)|m(x) for 
every nonzero v G V. 
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Proof By definition of minimal polynomial we know that m(T) = on V, so 
that in particular we have m(T)(v) = 0. But now Theorem 8.25 shows that 
mv(x)|m(x). I 

For ease of reference, we bring together Theorems 8.24 and 8.25 in the 
next basic result. 

Theorem 8.26 Let v £ V be nonzero, suppose T £ L(V), and let 

mv(x) = x'' - ad-i x'^"^ ao 

be the minimal polynomial of v. Then {v, T(v), . . . , T''"^(v)} is a basis for the 
T-cyclic subspace Z(v, T), and hence dim Z(v, T) = deg mv(x) = d. 

Proof From the way that mv(x) was constructed, the vector T'^(v) is the first 

vector in the sequence {v, T(v), T^(v), . . . } that is a linear combination of the 

preceding vectors. This means that the set S = {v, T(v), . . . , T'^"^(v)} is lin- 
early independent. We must now show that f(T)(v) is a linear combination of 
the elements of S for every f(x) £ ^[x]. 

Since mv(T)(v) = we have r^(v) = Sfjo^a^r'Cv) . Therefore 

d-2 d-2 d-2 

T'^\v)=^ar^\v) + a,_{r\v)=^ar^\v) + a,_,^a^r{v) . 

This shows that T''"^^(v) is a linear combination of the elements of S. We can 
clearly continue this process for any T'^(v) with k > d, and therefore f(T)(v) is 
a linear combination of v, T(v), . . . , T'^"'(v) for every f(x) G ^[x]. Thus S is a 
basis for the T-cyclic subspace of V generated by v. I 

The following example will be used in the proof of the elementary divisor 
theorem given in the next section. 

Example 8.12 Suppose that the minimal polynomial of v is given by mv(x) = 

p(x)" where p(x) is a monic prime polynomial of degree d. Defining W = 

Z(v, T), we will show that p(T)^(W) is a T-cyclic subspace generated by 
p(T)(v), and is of dimension d(n - s) if s < n, and dimension if s > n. It 

should be clear that p(T)^(W) is a T-cyclic subspace since every element of W 
is of the form f(T)(v) for some f(x) G ^[x] and W is T-invariant. 
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Since p(x) is of degree d, we see that deg mv(x) = deg p(x)° = dn (see 
Theorem 6.2(b)). From Theorem 8.26, we then follows that W has the basis 
{v, T(v), ... , T'^°"^(v)}. This means that any w £ W may be written as 

w = aov + aiT(v) + • • • + adn-iT''"-kv) 

for some set of scalars ai. Applying p(T)^ to w we have 

p(T)^(w) = aop(T)^(v) + • • • + ai[Tip(T)^](v) + • • • + adn-i[T''"-ip(T)^](v) . 

But mv(T)(v) = p(T)"(v) = where deg mv(x) = dn, and deg p(x)^ = ds. 

Therefore, if s > n we automatically have p(T)^(w) = so that p(T)'*(W) is of 
dimension 0. If s < n, then the maximum value of i in the expression for 

p(T)^(w) comes from the requirement that i + ds < dn which is equivalent to 
i < d(n - s). This leaves us with 

p(T)^(w) = ao[p(T)^(v)] + --- + ad(„-s)-iT'i("-^)-i[p(T)^(v)] 

and we now see that any element in p(T)^(W) is a linear combination of the 
terms aiT'[p(T)^(v)] for i = 0, . . . , d(n - s) - 1. Therefore if s < n, this shows 
that p(T)^(W) is a T-cyclic subspace of dimension d(n - s) generated by 
p(T)^(v). // 

In Section 7.4 we showed that the minimal polynomial for T was the 
unique monic generator of the ideal Nt = nvevNT(v). If we restrict ourselves 
to the subspace Z(v, T) of V then, as we now show, it is true that the minimal 
polynomial mv(x) of v is actually the minimal polynomial for Ty = T|Z(v, T). 

Theorem 8.27 Let Z(v, T) be the T-cyclic subspace of V generated by v. 
Then mv(x) is equal to the minimal polynomial for Ty = T|Z(v, T). 

Proof Since Z(v, T) is spanned by {v, T(v), the fact that 

mv(T)(v) = means that mv(T) = on Z(v, T) (by Theorem 7.2). If p(x) is the 
minimal polynomial for Ty, then Theorem 7.4 tells us that p(x)|mv(x). On the 
other hand, from Theorem 7.17(a), we see that p(T)(v) = p(Tv)(v) = since 
p(x) is the minimal polynomial for Ty . Therefore, Theorem 8.25 shows us 
that mv(x)|p(x). Since both mv(x) and p(x) are monic, this implies that mv(x) 
= p(x). I 
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Theorem 8.27 also gives us another proof of the corollary to Theorem 
8.25. Thus, since mv(x) = p(x) (i.e., the minimal polynomial for Ty), Theorem 
7.17(b) shows that mv(x)|m(x). Moreover, we have the next result that ties 
together these concepts with the structure of quotient spaces. 

Theorem 8.28 Suppose T £ L(V), let W be a T-invariant subspace of V and 
let T £ A( V) be the induced linear operator on V = VAV (see Theorem 7.35). 
Then the minimal polynomial inv(x) for v E. VAV divides the minimal 
polynomial m(x) for T. 

Proof From the corollary to Theorem 8.25 we have mv(x)|m(x) where in(x) 
is the minimal polynomial for T. But m(x)|m(x) by Theorem 7.35. I 

Corollary Using the same notation as in Theorems 8.25 and 8.28, if the 
minimal polynomial for T is of the form p(x)° where p(x) is a monic prime 
polynomial, then for any v £ V we have mv(x) = p(x)°' and inv(x) = p(x)°2 for 
some ni, n2 < n. 

Proof From the above results we know that mv(x)|p(x)" and mv(x)|p(x)". The 
corollary then follows from this along with the unique factorization theorem 
(Theorem 6.6) and the fact that p(x) is monic and prime. I 

In the discussion that followed Theorem 7.16 we showed that the (unique) 
minimal polynomial m(x) for T E. L(V) is also the minimal polynomial mv(x) 
for some v G V. (This is because each basis vector Vj for V has its own mini- 
mal polynomial m i(x), and the least common multiple of the mi(x) is both the 
minimal polynomial for some vector v G V and the minimal polynomial for 
T.) Now suppose that v also happens to be a cyclic vector for T, i.e., Z(v, T) = 
V. By Theorem 8.26 we know that 

dim V = dim Z(v, T) = deg mv(x) = deg m(x) . 

However, the characteristic polynomial At(x) for T must always be of degree 
equal to dim V, and hence the corollary to Theorem 7.42 (or Theorems 7.11 
and 7.12) shows us that m(x) = At(x). 

On the other hand, suppose that the characteristic polynomial At(x) of T is 
equal to the minimal polynomial m(x) for T. Then if v G V is such that mv(x) 
= m(x) we have 



dim V = deg At(x) = deg m(x) = deg mv(x) 
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Applying Theorem 8.26 again, we see that dim Z(v, T) = deg mv(x) = dim V, 
and hence v is a cyclic vector for T. We have thus proven the following useful 
result. 

Theorem 8.29 Let V be finite-dimensional and suppose T G L(V). Then T 
has a cyclic vector if and only if the characteristic and minimal polynomials 
for T are identical. Thus the matrix representation of T is nonderogatory. 

In view of Theorem 8. 12, our next result should have been expected. 

Theorem 8.30 Let Z(v, T) be a T-cyclic subspace of V, let Ty = T|Z(v, T) 
and suppose that the minimal polynomial for v is given by 

mv(x) = x'' - ad-ix''"^ ao . 

Then the matrix representation of Ty relative to the basis v, T(v), . . . , T''"^(v) 
for Z(v, T) is the companion matrix 



C{m,{x)) = 



/o 





• 


• 


«0 


1 





• 


• 


aj 





1 


• 


• 










• 


• 




.0 





• 


• 1 





Proof Simply look at Ty applied to each of the basis vectors of Z(v, T) and 
note that my(T)(v) = implies that T''(v) = aoV + • • • + ad-iT'^"\v). This 
yields 

r,(v) = Ov+r(v) 
r,(r(v)) = Ov+or(v)+r^(v) 

r^(r^"^ (v)) = Ov + • • • + r^"Vv) 

T, {T'-' (v)) = (v) = ao V + • • • + a,_,T'-' (v) 

As usual, the ith column of the matrix representation of Ty is just the image 
under Ty of the ith basis vector of Z(v, T) (see Theorem 5.11). I 
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Exercises 

1. If T G L(V) and v G V, prove that Z(v, T) is the intersection of all T- 
invariant subspaces containing v. 

2. Suppose T G L(V), and let u, v G V have relatively prime minimal poly- 
nomials mu(x) and mv(x). Show that mu(x)mv(x) is the minimal polyno- 
mial of u + V. 

3. Prove that Z(u, T) = Z(v, T) if and only if g(T)(u) = v where g(x) is rela- 
tively prime to mu(x). 

8.8 THE ELEMENTARY DIVISOR THEREOM * 

The reader should recall from Section 7.5 that if the matrix representation A 
of an operator T G L(V) is the direct sum of smaller matrices (in the appro- 
priate basis for V), then V is just the direct sum of T-invariant subspaces (see 
Theorem 7.20). If we translate Theorem 8.16 (the rational canonical form) 
into the corresponding result on the underlying space V, then we obtain the 
elementary divisor theorem. 

Theorem 8.31 (Elementary Divisor Theorem) Let V ^ {0} be finite- 
dimensional over an arbitrary field ^, and suppose T G L(V). Then there exist 
vectors Vi, . . . , Vr in V such that: 

(a) V = Z(vi,T)©...eZ(vr,T). 

(b) Each Vi has minimal polynomial Pi(x)°' where pi(x) G ^[x] is a monic 
prime. 

(c) The number r of terms in the decomposition of V is uniquely deter- 
mined, as is the set of minimal polynomials Pi(x)"'. 

Proof This is easy to prove from Theorem 8.16 (the rational canonical form) 
and what we know about companion matrices and cyclic subspaces (partic- 
ularly Theorem 8.30). The details are left to the reader. I 

From Theorem 8.26 we see that dim Z(Vi, T) = deg Pi(x)"', and hence from 
the corollary to Theorem 2. 15 we have 

r 

dimy = 2)deg/>,(x)"' . 

The polynomials Pi(x)"' defined in Theorem 8.31 are just the elementary divi- 
sors of xl - T. For example, suppose that T G L(V) and xl - T has the ele- 
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mentary divisors x + 1, (x - 1)^, x + 1, x^ + 1 over the field IR. This means that 
V is a vector space over IR with 

V = Z(vi, T) © Z(V2, T) © Z(V3, T) © Z(V4, T) 

and the minimal polynomials of v,, V2, V3, V4 are x + 1, (x - 1) , x + 1, x + 1 
respectively. Furthermore, T = Ti © T2 © T3 © T4 where Tj = T|Z(Vi, T) and 
the minimal polynomial for Tj is just the corresponding minimal polynomial 
of Vi (Theorem 8.27). Note that if the field were C instead of IR, then x^ + 1 
would not be prime, and hence could not be an elementary divisor of xl - T. 

It is important to realize that Theorem 8.31 only claims the uniqueness of 
the set of elementary divisors of xl - T. Thus the vectors Vj, . . . , Vr and 
corresponding subspaces Z(vi, T), . . . , Z(Vr, T) are themselves not uniquely 
determined by T. In addition, we have seen that the elementary divisors are 
unique only up to a rearrangement. 

It is also possible to prove Theorem 8.31 without using Theorem 8.16 or 
any of the formalism developed in Sections 8.2 - 8.7. We now present this 
alternative approach as a difficult but instructive application of quotient 
spaces, noting that it is not needed for anything else in this book. We begin 
with a special case that takes care of most of the proof. Afterwards, we will 
show how Theorem 8.31 follows from Theorem 8.32. It should also be 
pointed out that Theorem 8.32 also follows from the rational canonical form 
(Theorem 8.16). 

Theorem 8.32 Let T G L(V) have minimal polynomial p(x)" where p(x) is a 
monic prime polynomial. Then there exist vectors Vi, . . . , Vr E V such that 

V = Z(vi, T) © • • • © Z(vr, T) . 

In addition, each Vi has corresponding minimal polynomial (i.e., order) given 
by p(x)°' where n = ni > nj > • • • > nr . Furthermore, any other decomposition 
of V into the direct sum of T-cyclic subspaces has the same number r of 
components and the same set of minimal polynomials (i.e., orders). 

Proof Throughout this (quite long) proof, we will use the term "order" rather 
than "minimal polynomial" for the sake of clarity. Furthermore, we will refer 
to the T- order of a vector rather than simply the order when there is a 
possible ambiguity with respect to the operator being referred to. 

We proceed by induction on the dimension of V. First, if dim V = 1, then 
T(V) = V and hence f(T)(V) = V for any f(x) E jF[x]. Therefore V is T-cyclic 
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and the theorem holds in this case. Now assume dim V > 1, and suppose that 
the theorem holds for all vector spaces of dimension less than that of V. 

Since p(x)" is the minimal polynomial for T, we know that p(T)"(v) = 
for all V G V. In particular, there must exist a v, G V such that p(T)"(v,) = 
but p(T)"~\v,) ^ (or else p(x)""^ would be the minimal polynomial for T). 
This means that p(x)" must be the T-order of v, (since the minimal polynomial 
of Vj is unique and monic). Now let Zi = Z(vi, T) be the T-invariant T-cyclic 
subspace of V generated by Vi. We also define V = V/Zi along with the 
induced operator T G A(V). Then by Theorem 7.35 we know that the minimal 

polynomial for T divides the minimal polynomial p(x)° for T, and hence the 
minimal polynomial for T is p(x)°2 where nj < n. This means that V and T 
satisfy the hypotheses of the theorem, and hence by our induction hypothesis 
(since dim V < dim V), V must be the direct sum of T-cyclic subspaces. We 
thus write 

V = Z(v2,T)©---©Z(Vr,T) 

where each Vj has corresponding T-order p(x)"' with n > nj > • • • > nr . It is 
important to remember that each of these Vi is a coset of Zi in V, and thus may 
be written as Vi = Ui + Zi for some Ui G V. This means that every element of Vi 
is of the form Uj + Zj for some Zj G Zj. 

We now claim that there exists a vector V2 in the coset V2 such that the T- 
order of V2 is just the T-order p(x)"2 of V2. To see this, let w G V2 be arbitrary 
so that we may write w = U2 + Z2 for some U2 G V and Z2 G Zi C V. Since 
p(T)"2(v 2) = = Zi, we have (see Theorem 7.35) 

Zi = p(T)"2(v2) = P(T)"2(U2 + Z0 = p(T)"2(u2) + Z, 

and hence p(T)°2(u2) G Zj. Using the fact that Zi is T-invariant, we see that 

p(T)°Kw) = p(T)°Ku2) + p(T)°Kz2) e Zi . 

Using the definition of Z, as the T-cyclic subspace generated by Vi, this last 
result implies that there exists a polynomial f(x) G ^[x] such that 

p{jr^(w) = f{J){y,) . (1) 
But p(x)° is the minimal polynomial for T, and hence (1) implies that 



= p(T)°(w) = p(T)°-XT)°2(w) = p(T)°-°2f(T)(vi) 
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Since we showed that p(x)° is also the T-order of Vi, Theorem 8.25 tells us 
that p(x)° divides p(x)° "°2f(x), and hence there exists a polynomial g(x) £ 
^[x] such that p(x)° ""^ f(x) = p(x)°g(x). Rearranging, this may be written as 
p(x)° "°2[f(x) - p(x)°2g(x)] = 0. Since ^[x] is an integral domain, this implies 
(see Theorem 6.2, Corollary 2) 

f{x) = p{xr^g{x) . (2) 

We now define 

V2=w-g(r)(vi) . (3) 

By definition of Zi we see that w - = g(T)(vi) £ Zi, and therefore (see 
Theorem 7.30) 

V2 E w + Zi = U2 + Z2 + Z1 = U2 + Z1 = V2 . 

Since V2 = U2 + Zi and G Vj, it follows that V2 = U2 + z for some z G Zj. 
Now suppose that h(x) is any polynomial such that h(T)(v2) = 0. Then 

= h(T)(v2) = h(T)(u2 + z) = h(T)(u2) + h(T)(z) 

so that h(T)(u2) = -h(T)(z) G Zi (since Zi is T-invariant). We then have 

h(T)(v2) = h(T)(u2+Zi) = h(T)(u2) + Zi = Zi = . 

According to Theorem 8.25, this then means that the T-order of V2 divides 
h(x). In particular, choosing h(x) to be the T-order of Vj, we see that the T- 
order of V2 is some multiple of the T-order of V2. In other words, the T-order 

of V2 must equal p(x)"2q(x) for some polynomial q(x) G ^^[x]. However, from 
(3), (1) and (2) we have 

p{Tf^{v^) = p{Trb^-g{T){v,)\ 

= p(Tr(w)-p(Trg(T)(v,) 
= f(T)(vO-f(T)(v,) 
= . 

This shows that in fact the T-order of V2 is equal to p(x)"2 as claimed. 

In an exactly analogous manner, we see that there exist vectors V3 , . . . , Vr 

in V with Vj G Vj and such that the T-order of Vj is equal to the T-order p(x)"' 
of Vi. For each i = 1, . . . , r we then define the T-cyclic subspaces Zj = Z(Vi, T) 
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where Zi was defined near the beginning of the proof. We must show that V = 

Zi© - • -ezr. 

Let deg p(x) = d so that deg p(x)"' = dnj (see Theorem 6.2(b)). Since p(x)°' 
is the T-order of Vi , Theorem 8.26 shows that Z(Vi, T) has basis 

{Vi, T(vO, . . . , Td"i-i(vO} . 

Similarly, p(x)"' is also the T-order of Vi for i = 2, . . . , r and hence Z(Vi, T) 
has the basis 

{Vi,T(vO,...,T'i°i-i(vi)} . 

Since V = Z(v2, T) © • • • © Z(vr, T), we see from Theorem 2.15 that V has 
basis 

{V2, . . . , Tdn2-l(v,), . . . , Vr, . . . , T''"^-l(Vr)} . 

Recall that Vj = U; + Zi and Vj E Vj. This means that Vj = Ui + Zj for some 
Zi E Zi so that 

Vi = Vi - Zi + Zi = Vi + Zi 

and hence (see the proof of Theorem 7.35) 

T'^(Vi) = T"^(Vi+Zi) = T«^(Vi + Zi) = T"^(Vi) + Zi . 

Using this result in the terms for the basis of V, Theorem 7.34 shows that V 
has the basis (where we recall that Zj is just Z(vi, T)) 

{Vi, . . . , T'l°i-l(Vi), V2, . . . , T'l°2-l(V2), . . . , Vr, . . . , T'l«^-l(Vr)} . 

Therefore, by Theorem 2.15, V must be the direct sum of the Zi = Z(Vi, T) for 
i = 1, . . . , r. This completes the first part of the proof. 

We now turn to the uniqueness of the direct sum expansion of V. Note that 
we have just shown that V = Zj © • • • © Zr where each Zi = Z(Vi, T) is a T- 
cyclic subspace of V. In addition, the minimal polynomial (i.e., order) of Vi is 

p(x)°' where p(x) is a monic prime polynomial of degree d, and p(x)" is the 
minimal polynomial for T. Let us assume that we also have the decomposition 
V = Z'l © • • • © Z's where Z'i = Z(v'i, T) is a T-cyclic subspace of V, and v'i 

has minimal polynomial p(x)™' with m, > • • • > m^. (Both Vi and v'l have 
orders that are powers of the same polynomial p(x) by the corollary to 
Theorem 8.25.) We must show that s = r and that mi = ni for i = 1, . . . , r. 

Suppose that ni ^ mi for at least one i, and let k be the first integer such 
that Uk ^ mk while nj = mj for j = 1, . . . , k - 1. We may arbitrarily take nt > 
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nik. Since V = Z'l © • • • © Z's , any u £ V may be written in the form u = 
u'l + • • • + u's where u'i £ ZV Furthermore, since pCT)"^' is linear, we see that 

pm'^Ku) = pm'^Ku'i) + • • • + pm'^Ku's) 

and hence we may write 

p(T)'"'(V) = p(T)"^'(Z'0©...©p(T)"^'(Z's) . 

Using the definition of the T-cycUc subspace Z'k along with the fact that 
p(T)"^Kv'k) = 0, it is easy to see that p(T)"^KZ'k) = 0. But the inequality mk > 
mk+i > • • • > ms implies that p(T)'^'<^(Z'i) = for i = k, k + 1, . . . , s and hence 
we have 

p(T)'^KV) = p(T)'^KZ'i)©---©p(T)'^KZ'k-l) . 

From Example 8.12, we see that p(T)™'(Z'j) is of dimension d(mj - m^) for 
mi < mj . This gives us (see the corollary to Theorem 2.15) 

dim p(T)'"HV) = d(mi - mO + • • • + d(mk-i - m^) . 

On the other hand, we have V = Zi © • • • © Zr, and since k < r it follows 
that Zi © • • • © Zk C V. Therefore 

p(T)"^KV) D p(T)"^KZi) © • • • ©p(T)'^KZk) 

and hence, since dim p(T)™'(Zj) = d(nj - m^) for m^ < nj , we have 

dim p(T)'"HV) > d(ni - nik) + • • • + d(nk-i - mk) + d(nk - mk) . 

However, ni = m^ for i = 1, . . . , k - 1 and nk > mk. We thus have a contradic- 
tion in the value of dim p(T)'^'<^(V), and hence ni = m^ for every i = 1, . . . , r 
(since if r < s for example, then we would have = ns ;t ms for every s > r). 
This completes the entire proof of Theorem 8.32. I 

In order to prove Theorem 8.31 now, we must remove the requirement in 
Theorem 8.32 that T G L(V) have minimal polynomial p(x)°. For any finite- 
dimensional V {0}, we know that any T G L(V) has a minimal polynomial 
m(x) (Theorem 7.4). From the unique factorization theorem (Theorem 6.6), 
we know that any polynomial can be factored into a product of prime 
polynomials. We can thus always write m(x) = Pi(x)°' • • • Pr(x)°"^ where each 
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Pi(x) is prime. Hence, from the primary decomposition theorem (Theorem 
7.23), we then see that V is the direct sum of T-invariant subspaces Wi = 
Kerpi(T)°' for i = 1, . . . , r such that minimal polynomial of Tj = T|Wi is 
Pi(x)"'. 

Applying Theorem 8.32 to each space Wi and operator Tj G L(Wi), we see 
that there exist vectors Wi^ E Wi for k = 1, . . . , rj such that Wi is the direct 
sum of the Z(wi^, Tj). Moreover, since each Wj is T-invaiiant, each of the Tj- 
cyclic subspaces Z(wi^, TJ is also T-cyclic, and the minimal polynomial of 
each generator Wi^. is a power of Pi(x). This discussion completes the proof of 
Theorem 8.31. 

Finally, we remark that it is possible to prove the rational form of a matrix 
from this version (i.e., proof) of the elementary divisor theorem. However, we 
feel that at this point it is not terribly instructive to do so, and hence the inter- 
ested reader will have to find this approach in one of the books listed in the 
bibliography. 

Exercise 

Prove Theorem 8.31 using the rational canonical form. 
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